Paper : 2305.11206
Superficial Alignment Hypothesis
Less is more for alignment ( LIMA ) was a paper published by Meta that explored the use of distilled examples in fine-tuning. It challenges the notion that RLHF helps to imbue models with knowledge/capabilities and instead argues two main points
- Almost all knowledge in large language models are learnt during pretraining steps
- Only limited instruction tuning data is necessary to teach models to produce high quality output.
In short, alignment can simply be a simple process where the model learns the style or format for interacting with users. They show this by training LIMA, a pretrained 65B-parameter LLaMa model which is fine tuned on their own dataset.
Dataset
The dataset is generated by curating 1,000 examples that approximate real user prompts and high-quality responses.
- 750 top questions and answers from community forums are selected. They selected Stack Exchange ( Where they sampled answers from a total of 75 STEM exchanges and filtered out ones which might be too niche ), Reddit and Wiki How
- Additional 250 examples of prompts and responses are manually written
Questions and answers that were ultimately selected were filtered based on
- The helpfulness of the responses
- How contained the information was in the prompt - they discarded answers which made references to other responses and information
- length of response - answers that were too short ( < 1200 characters ) or too long ( > 4096 characters ) were removed from the dataset