Fine-Tuning Llama 3.2 11B for Extractive Question Answering

Giant Language Fashions (LLMs) are highly effective instruments that may carry out a variety of pure language processing duties. Nonetheless, as a consequence of their generic and broadly targeted coaching, they could not at all times carry out optimally on particular duties. Tremendous-tuning is a way that permits us to adapt a pre-trained LLM to a particular process, equivalent to extractive query answering, with out altering the unique weights. On this article, we are going to discover how one can fine-tune Llama 3.2 11B utilizing the Q-LoRA method and reveal its efficiency enhance on the SQuAD v2 dataset.

What’s LoRA?

LoRA (Low-Rank Adaption) is a way used so as to add new weights to an current mannequin to switch its conduct with out altering the unique weights. It entails including new “adapter” weights that modify the output of sure layers, that are modified throughout the coaching course of whereas the unique weights stay the identical. By freezing the unique weights, LoRA ensures that the mannequin retains its pre-trained information whereas including new, task-specific capabilities by means of the adapter weights.

Defining the Experiment

We’ll fine-tune Llama 3.2 11B for extractive query answering utilizing the SQuAD v2 dataset on this experiment. The objective is to coach the mannequin to extract particular parts of textual content that immediately reply a consumer’s query with out summarizing or rephrasing.

System Setting

This experiment was run on a Google Colab platform with an A100 GPU. The code is written in Python and makes use of the Hugging Face Transformers library.

Putting in Packages

!pip set up -U transformers peft bitsandbytes datasets trl consider bert_score

Loading Information

We’ll use the SQuAD v2 dataset for coaching and analysis.

from datasets import load_dataset

ds = load_dataset("squad_v2")
print(ds)

Output:

DatasetDict({
    prepare: Dataset({
        options: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        options: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

Information Preparation

We’ll break up the dataset into coaching, validation, and take a look at units and convert the samples right into a format appropriate for Llama.

num_training_samples = 15000
num_test_samples = 750
num_validation_samples = 1000

training_samples = ds['train'].choose([i for i in range(num_training_samples)])
test_samples = ds['train'].choose([i for i in range(num_training_samples, num_training_samples+num_test_samples)])
validation_samples = ds['validation'].choose([i for i in range(num_validation_samples)])

def convert_squad_sample_to_llama_conversation(pattern):
    
    return {"textual content": sample_conversation, "messages": messages, "reply": reply}

conversation_training_samples = training_samples.map(convert_squad_sample_to_llama_conversation)
conversation_test_samples = test_samples.map(convert_squad_sample_to_llama_conversation)
conversation_validation_samples = validation_samples.map(convert_squad_sample_to_llama_conversation)

Mannequin Preparation

We’ll load the Llama 3.2 11B mannequin with 4-bit quantization and arrange the LoRA config.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_name = "meta-llama/Llama-3.2-11B-Imaginative and prescient-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.padding_side = "left"
tokenizer.pad_token = tokenizer.eos_token

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)
mannequin = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    device_map="auto"
)
mannequin.config.pad_token_id = tokenizer.pad_token_id
mannequin.config.use_cache = False

from peft import LoraConfig
rank = 128
alpha = rank*2
peft_config = LoraConfig(
    r=rank,
    lora_alpha=alpha,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'q_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']
)

Coaching

We’ll use the SFTTrainer from the trl library to coach the mannequin.

from transformers import TrainingArguments
from trl import SFTTrainer

training_arguments = TrainingArguments(
    output_dir=model_checkpoint_path,
    optim='paged_adamw_32bit',
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    log_level='debug',
    evaluation_strategy = "steps",
    save_strategy='steps',
    logging_steps=8,
    eval_steps=8,
    save_steps=8,
    learning_rate=1e-4,
    fp16=True,
    num_train_epochs=4,
    max_steps=120,
    warmup_ratio=0.1,
    load_best_model_at_end = True,
    overwrite_output_dir = True,
    lr_scheduler_type='linear',
)

coach = SFTTrainer(
    mannequin=mannequin,
    train_dataset=conversation_training_samples,
    eval_dataset=conversation_test_samples,
    peft_config=peft_config,
    dataset_text_field='textual content',
    max_seq_length=1024,
    tokenizer=tokenizer,
    args=training_arguments
)

Analysis

We’ll consider the mannequin utilizing the bert-score and exact-match metrics.

from consider import load

bert_model = "microsoft/deberta-v2-xxlarge-mnli"
bertscore = load("bertscore")
exact_match_metric = load("exact_match")

def get_bulk_predictions(pipe, samples):
    

def get_base_and_tuned_bulk_predictions(samples):
    

conversation_validation_samples = conversation_validation_samples.map(get_base_and_tuned_bulk_predictions, batched=True, batch_size=20)

base_predictions = conversation_validation_samples['base_prediction']
solutions = conversation_validation_samples['answer']
base_validation_bert_score = bertscore.compute(predictions=base_predictions, references=solutions, lang="en", model_type=bert_model, machine="cuda:0")
baseline_exact_match_score = exact_match_metric.compute(predictions=base_predictions, references=solutions)

trained_predictions = conversation_validation_samples['trained_prediction']
solutions = conversation_validation_samples['answer']
trained_validation_bert_score = bertscore.compute(predictions=trained_predictions, references=solutions, lang="en", model_type=bert_model, machine="cuda:0")
tuned_exact_match_score = exact_match_metric.compute(predictions=trained_predictions, references=solutions)

Outcomes

The coaching course of took round 1 hour on an A100 GPU. The outcomes present a big enchancment within the mannequin’s efficiency on the validation set.

Metric	Base Mannequin	Tuned Mannequin
BERT Rating	0.6469	0.7505
Precise Match	0.116	0.418

Conclusion

This text demonstrated how one can fine-tune Llama 3.2 11B for extractive query answering utilizing the Q-LoRA method. The outcomes present a big enchancment within the mannequin’s efficiency on the validation set, with a rise within the BERT and precise match scores. This system could be utilized to different duties and fashions, and we hope that this text serves as a complete information for future analysis and purposes.

Source link

Post Views: 60

#11B #Answering #Extractive #FineTuning #Llama #Question

Web3

Comparing LLM Fine-Tuning Frameworks: Axolotl, Unsloth, and Torchtune

April 25, 2025

Web3

Understanding LoRA’s Efficiency in Stable Diffusion Fine-Tuning

April 8, 2025

Gaming Global

Answering All Your Avowed Questions And More Hot Gaming Tips

February 23, 2025

More from Web3

Hyperliquid, Solana Lead Altcoin Rally as Institutions Pour .9B Into Crypto Funds

Hyperliquid, Solana Lead Altcoin Rally as Institutions Pour $1.9B Into Crypto Funds

Posted On June 16, 2025

Vismaya V 0

Briefly Altcoins together with Solana, Hyperliquid and XRP posted positive factors Monday morning. Ethereum additionally rose, as institutional flows hit their …

This Week in Crypto Games: Dogecoin Got Game, FIFA Rivals Launches

Posted On June 15, 2025

Andrew Hayward 0

It is powerful to maintain tabs on the ever-changing crypto gaming house, because of the fixed movement of reports: …

Categories

Popular Posts

Newsletter

Search

Editors

Fine-Tuning Llama 3.2 11B for Extractive Question Answering

What’s LoRA?

Defining the Experiment

System Setting

Putting in Packages

Loading Information

Information Preparation

Mannequin Preparation

Coaching

Analysis

Outcomes

Conclusion

You might also like

More from Web3

Hyperliquid, Solana Lead Altcoin Rally as Institutions Pour $1.9B Into Crypto Funds

This Week in Crypto Games: Dogecoin Got Game, FIFA Rivals Launches

Leave a Reply Cancel reply

Recent Posts

Share

Categories

Popular Posts

Newsletter

Search

Editors

Fine-Tuning Llama 3.2 11B for Extractive Question Answering

What’s LoRA?

Defining the Experiment

System Setting

Putting in Packages

Loading Information

Information Preparation

Mannequin Preparation

Coaching

Analysis

Outcomes

Conclusion

You might also like

More from Web3

Hyperliquid, Solana Lead Altcoin Rally as Institutions Pour $1.9B Into Crypto Funds

Share issue to personnel – 14 June 2025

This Week in Crypto Games: Dogecoin Got Game, FIFA Rivals Launches

Leave a Reply Cancel reply

Recent Posts

Share