Paper : 2305.11206

Superficial Alignment Hypothesis

Less is more for alignment ( LIMA ) was a paper published by Meta that explored the use of distilled examples in fine-tuning. It challenges the notion that RLHF helps to imbue models with knowledge/capabilities and instead argues two main points

Almost all knowledge in large language models are learnt during pretraining steps
Only limited instruction tuning data is necessary to teach models to produce high quality output.

In short, alignment can simply be a simple process where the model learns the style or format for interacting with users. They show this by training LIMA, a pretrained 65B-parameter LLaMa model which is fine tuned on their own dataset.

Dataset

The dataset is generated by curating 1,000 examples that approximate real user prompts and high-quality responses.

750 top questions and answers from community forums are selected. They selected Stack Exchange ( Where they sampled answers from a total of 75 STEM exchanges and filtered out ones which might be too niche ), Reddit and Wiki How
Additional 250 examples of prompts and responses are manually written

Questions and answers that were ultimately selected were filtered based on

The helpfulness of the responses
How contained the information was in the prompt - they discarded answers which made references to other responses and information
length of response - answers that were too short ( < 1200 characters ) or too long ( > 4096 characters ) were removed from the dataset

Ivan's notes (Powered by 🪴 Quartz 4.0)

Table of Contents

Less is More For Alignment

Superficial Alignment Hypothesis

Dataset

Graph View

Backlinks

Ivan's notes (Powered by 🪴 Quartz 4.0)

Table of Contents

Less is More For Alignment

Superficial Alignment Hypothesis §

Dataset §

Graph View

Backlinks

Superficial Alignment Hypothesis

Dataset