← All stories
● Covered by 1 source · 1 reportMedium impact

Direct Preference Optimization Reduces Text Degeneration in OCR Models

Aggregated by BrevFeed ai · updated 4d ago
🔖 Save

DharmaOCR introduces Direct Preference Optimization (DPO) to combat text degeneration in OCR models. The second training stage reduced degeneration rates by an average of 59.4%, addressing a significant limitation of supervised fine-tuning.

Key points

Introduction of DharmaOCR

In April, DharmaOCR was released, a specialized structured OCR model available on Hugging Face. A subsequent paper detailed its development methodology and benchmark results demonstrating its quality and cost efficiency.

Benchmarking Results

The paper benchmarked various vision-language model families, comparing both open-source and commercial options for structured document extraction tasks. The key metric reported was the text degeneration rate, which measures the frequency of repetition loops during transcription of Brazilian Portuguese text.

Degeneration rates varied widely among tested models, from below 1% to above 33% for vanilla models.

Limitations of Supervised Fine-Tuning

Supervised fine-tuning (SFT) typically improved degeneration rates, yet rarely brought them down to acceptable production levels. This indicates a structural limitation in that SFT optimizes outputs without explicitly addressing the degeneration issue. A ceiling exists regarding how much SFT can mitigate this failure.

Implementation of Direct Preference Optimization

The introduction of Direct Preference Optimization (DPO), which followed SFT, was shown to directly reduce degeneration in every model tested, averaging a 59.4% reduction and peaking at 87.6% in some cases. DPO employs binary preference signals, choosing correct transcriptions and rejecting degeneration loops, making it particularly suited for OCR tasks where subjectivity isn’t a factor.

Conclusion and Future Work

The exact reasons for SFT's limitations on reducing degeneration are still under investigation, with leading theories suggesting a loss granularity issue. DPO's application in OCR presents a significant advancement in improving output quality for these types of models.

✨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors — check the original sources. How BrevFeed works →

Reporting from

DharmaOCR introduces Direct Preference Optimization (DPO) to combat text degeneration in OCR models. The second training stage reduced degeneration rates by an average of 59.4%, addressing a significant limitation of supervised fine-tuning.