continuation of #5077. uses the BIOES dataset produced there
Goal
Finetune DeBERTa-v3-large to predict required phrase spans in license rule text using BIOES labels
Tasks
- Subword tokenization with label alignment (-100 for continuation tokens)
- Training with class-weighted loss (handle O vs B/I/E/S imbalance)
- Include negative samples (rules without markers, all-O labels) for balanced training
- Evaluation: token F1, exact span match
- ONNX export for CPU inference
continuation of #5077. uses the BIOES dataset produced there
Goal
Finetune DeBERTa-v3-large to predict required phrase spans in license rule text using BIOES labels
Tasks