PaddleNLP全新发布UIE-X 🧾,除已有纯文本抽取的全部功能外,新增文档抽取能力。
UIE-X延续UIE的思路,基于跨模态布局增强预训练模型文心ERNIE-Layout重训模型,融合文本、图像、布局等信息进行联合建模,能够深度理解多模态文档。基于Prompt思想,实现开放域信息抽取,支持零样本抽取,小样本能力领先。
项目链接:https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/information_extraction
本案例为UIE-X在医疗领域的实战,通过少量标注+模型微调即可具备定制场景的端到端文档信息提取能力!
目前医疗领域有大量的医学检查报告单,病历,发票,CT影像,眼科等等的医疗图片数据。现阶段,针对这些图片都是靠人工分类,结构化录入系统中,做患者的全生命周期的管理。
耗时耗力,人工成本极大。如果能靠人工智能的技术做到图片的自动分类和结构化,将大大的降低成本,提高系统录入的整体效率。
本案例基于PaddleNLP最新开源的UIE-X,以医学检查单这种医疗领域常见的图片类型为例,展示从数据标注、模型训练到Taskflow一键部署的全流程解决方案
数据集来源:https://tianchi.aliyun.com/dataset/126039
数据集样例展示:
医疗场景常见图片展示:
!pip install --upgrade --user paddleocr
!pip install --upgrade --user paddlenlp
我们推荐使用数据标注平台Label-Studio进行数据标注,本案例也打通了从标注到训练的通道,即Label-Studio导出数据后可通过label_studio.py脚本轻松将数据转换为输入模型时需要的形式,实现无缝衔接。为了达到这个目的,您可以参考信息抽取任务Label-Studio标注指南在Label-Studio平台上标注数据:
# 下载标注数据:
!wget https://paddlenlp.bj.bcebos.com/datasets/medical_checklist.zip
!unzip medical_checklist.zip
!python label_studio.py \
--label_studio_file ./medical_checklist/label_studio.json \
--save_dir ./medical_checklist \
--splits 0.8 0.2 0\
--task_type ext \
!python finetune.py \
--device gpu \
--logging_steps 5 \
--save_steps 25 \
--eval_steps 25 \
--seed 42 \
--model_name_or_path uie-x-base \
--output_dir ./checkpoint/model_best \
--train_path medical_checklist/train.txt \
--dev_path medical_checklist/dev.txt \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--num_train_epochs 5 \
--learning_rate 1e-5 \
--label_names 'start_positions' 'end_positions' \
--do_train \
--do_eval \
--do_export \
--export_model_dir ./checkpoint/model_best \
--overwrite_output_dir \
--disable_tqdm True \
--metric_for_best_model eval_f1 \
--load_best_model_at_end True \
--save_total_limit 1
[2023-07-21 15:36:09,684] [ WARNING] - evaluation_strategy reset to IntervalStrategy.STEPS for do_eval is True. you can also set evaluation_strategy='epoch'.
[2023-07-21 15:36:09,684] [ INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2023-07-21 15:36:09,684] [ INFO] - ============================================================
[2023-07-21 15:36:09,685] [ INFO] - Model Configuration Arguments
[2023-07-21 15:36:09,685] [ INFO] - paddle commit id :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:36:09,685] [ INFO] - export_model_dir :./checkpoint/model_best
[2023-07-21 15:36:09,685] [ INFO] - model_name_or_path :uie-x-base
[2023-07-21 15:36:09,685] [ INFO] -
[2023-07-21 15:36:09,685] [ INFO] - ============================================================
[2023-07-21 15:36:09,685] [ INFO] - Data Configuration Arguments
[2023-07-21 15:36:09,685] [ INFO] - paddle commit id :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:36:09,685] [ INFO] - dev_path :medical_checklist/dev.txt
[2023-07-21 15:36:09,685] [ INFO] - max_seq_len :512
[2023-07-21 15:36:09,685] [ INFO] - train_path :medical_checklist/train.txt
[2023-07-21 15:36:09,685] [ INFO] -
[2023-07-21 15:36:09,685] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: False
[2023-07-21 15:36:09,686] [ INFO] - Model config ErnieLayoutConfig {
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"coordinate_size": 128,
"enable_recompute": false,
"eos_token_id": 2,
"fuse": false,
"gradient_checkpointing": false,
"has_relative_attention_bias": true,
"has_spatial_attention_bias": true,
"has_visual_segment_embedding": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"image_feature_pool_shape": [
7,
7,
256
],
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_2d_position_embeddings": 1024,
"max_position_embeddings": 514,
"max_rel_2d_pos": 256,
"max_rel_pos": 128,
"model_type": "ernie_layout",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 1,
"paddlenlp_version": null,
"pool_act": "tanh",
"rel_2d_pos_bins": 64,
"rel_pos_bins": 32,
"shape_size": 128,
"task_id": 0,
"task_type_vocab_size": 3,
"type_vocab_size": 100,
"use_task_id": true,
"vocab_size": 250002
}
[2023-07-21 15:36:09,687] [ INFO] - Configuration saved in /home/aistudio/.paddlenlp/models/uie-x-base/config.json
[2023-07-21 15:36:09,687] [ INFO] - Downloading uie_x_base.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/uie_x/uie_x_base.pdparams
100%|██████████████████████████████████████| 1.05G/1.05G [00:15<00:00, 73.4MB/s]
W0721 15:36:28.591925 856 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0721 15:36:28.595674 856 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[2023-07-21 15:36:30,069] [ INFO] - All model checkpoint weights were used when initializing UIEX.
[2023-07-21 15:36:30,069] [ INFO] - All the weights of UIEX were initialized from the model checkpoint at uie-x-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use UIEX for predictions without further training.
[2023-07-21 15:36:30,070] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load 'uie-x-base'.
[2023-07-21 15:36:30,071] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/vocab.txt and saved to /home/aistudio/.paddlenlp/models/uie-x-base
[2023-07-21 15:36:30,132] [ INFO] - Downloading vocab.txt from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/vocab.txt
100%|██████████████████████████████████████| 2.70M/2.70M [00:00<00:00, 48.4MB/s]
[2023-07-21 15:36:30,263] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/uie-x-base
[2023-07-21 15:36:30,325] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 63.2MB/s]
[2023-07-21 15:36:31,214] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/uie-x-base/tokenizer_config.json
[2023-07-21 15:36:31,214] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/uie-x-base/special_tokens_map.json
[2023-07-21 15:36:33,843] [ INFO] - ============================================================
[2023-07-21 15:36:33,844] [ INFO] - Training Configuration Arguments
[2023-07-21 15:36:33,844] [ INFO] - paddle commit id :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:36:33,844] [ INFO] - _no_sync_in_gradient_accumulation:True
[2023-07-21 15:36:33,844] [ INFO] - activation_quantize_type :None
[2023-07-21 15:36:33,844] [ INFO] - adam_beta1 :0.9
[2023-07-21 15:36:33,844] [ INFO] - adam_beta2 :0.999
[2023-07-21 15:36:33,844] [ INFO] - adam_epsilon :1e-08
[2023-07-21 15:36:33,844] [ INFO] - algo_list :None
[2023-07-21 15:36:33,844] [ INFO] - batch_num_list :None
[2023-07-21 15:36:33,844] [ INFO] - batch_size_list :None
[2023-07-21 15:36:33,844] [ INFO] - bf16 :False
[2023-07-21 15:36:33,844] [ INFO] - bf16_full_eval :False
[2023-07-21 15:36:33,844] [ INFO] - bias_correction :False
[2023-07-21 15:36:33,844] [ INFO] - current_device :gpu:0
[2023-07-21 15:36:33,844] [ INFO] - dataloader_drop_last :False
[2023-07-21 15:36:33,844] [ INFO] - dataloader_num_workers :0
[2023-07-21 15:36:33,845] [ INFO] - device :gpu
[2023-07-21 15:36:33,845] [ INFO] - disable_tqdm :True
[2023-07-21 15:36:33,845] [ INFO] - do_compress :False
[2023-07-21 15:36:33,845] [ INFO] - do_eval :True
[2023-07-21 15:36:33,845] [ INFO] - do_export :True
[2023-07-21 15:36:33,845] [ INFO] - do_predict :False
[2023-07-21 15:36:33,845] [ INFO] - do_train :True
[2023-07-21 15:36:33,845] [ INFO] - eval_batch_size :16
[2023-07-21 15:36:33,845] [ INFO] - eval_steps :25
[2023-07-21 15:36:33,845] [ INFO] - evaluation_strategy :IntervalStrategy.STEPS
[2023-07-21 15:36:33,845] [ INFO] - flatten_param_grads :False
[2023-07-21 15:36:33,845] [ INFO] - fp16 :False
[2023-07-21 15:36:33,845] [ INFO] - fp16_full_eval :False
[2023-07-21 15:36:33,845] [ INFO] - fp16_opt_level :O1
[2023-07-21 15:36:33,845] [ INFO] - gradient_accumulation_steps :1
[2023-07-21 15:36:33,845] [ INFO] - greater_is_better :True
[2023-07-21 15:36:33,845] [ INFO] - ignore_data_skip :False
[2023-07-21 15:36:33,845] [ INFO] - input_dtype :int64
[2023-07-21 15:36:33,845] [ INFO] - input_infer_model_path :None
[2023-07-21 15:36:33,845] [ INFO] - label_names :['start_positions', 'end_positions']
[2023-07-21 15:36:33,845] [ INFO] - lazy_data_processing :True
[2023-07-21 15:36:33,845] [ INFO] - learning_rate :1e-05
[2023-07-21 15:36:33,845] [ INFO] - load_best_model_at_end :True
[2023-07-21 15:36:33,845] [ INFO] - local_process_index :0
[2023-07-21 15:36:33,845] [ INFO] - local_rank :-1
[2023-07-21 15:36:33,845] [ INFO] - log_level :-1
[2023-07-21 15:36:33,845] [ INFO] - log_level_replica :-1
[2023-07-21 15:36:33,846] [ INFO] - log_on_each_node :True
[2023-07-21 15:36:33,846] [ INFO] - logging_dir :./checkpoint/model_best/runs/Jul21_15-36-09_jupyter-2631487-6518069
[2023-07-21 15:36:33,846] [ INFO] - logging_first_step :False
[2023-07-21 15:36:33,846] [ INFO] - logging_steps :5
[2023-07-21 15:36:33,846] [ INFO] - logging_strategy :IntervalStrategy.STEPS
[2023-07-21 15:36:33,846] [ INFO] - lr_scheduler_type :SchedulerType.LINEAR
[2023-07-21 15:36:33,846] [ INFO] - max_grad_norm :1.0
[2023-07-21 15:36:33,846] [ INFO] - max_steps :-1
[2023-07-21 15:36:33,846] [ INFO] - metric_for_best_model :eval_f1
[2023-07-21 15:36:33,846] [ INFO] - minimum_eval_times :None
[2023-07-21 15:36:33,846] [ INFO] - moving_rate :0.9
[2023-07-21 15:36:33,846] [ INFO] - no_cuda :False
[2023-07-21 15:36:33,846] [ INFO] - num_train_epochs :5.0
[2023-07-21 15:36:33,846] [ INFO] - onnx_format :True
[2023-07-21 15:36:33,846] [ INFO] - optim :OptimizerNames.ADAMW
[2023-07-21 15:36:33,846] [ INFO] - output_dir :./checkpoint/model_best
[2023-07-21 15:36:33,846] [ INFO] - overwrite_output_dir :True
[2023-07-21 15:36:33,846] [ INFO] - past_index :-1
[2023-07-21 15:36:33,846] [ INFO] - per_device_eval_batch_size :16
[2023-07-21 15:36:33,846] [ INFO] - per_device_train_batch_size :16
[2023-07-21 15:36:33,846] [ INFO] - prediction_loss_only :False
[2023-07-21 15:36:33,846] [ INFO] - process_index :0
[2023-07-21 15:36:33,846] [ INFO] - prune_embeddings :False
[2023-07-21 15:36:33,846] [ INFO] - recompute :False
[2023-07-21 15:36:33,846] [ INFO] - remove_unused_columns :True
[2023-07-21 15:36:33,846] [ INFO] - report_to :['visualdl']
[2023-07-21 15:36:33,846] [ INFO] - resume_from_checkpoint :None
[2023-07-21 15:36:33,846] [ INFO] - round_type :round
[2023-07-21 15:36:33,847] [ INFO] - run_name :./checkpoint/model_best
[2023-07-21 15:36:33,847] [ INFO] - save_on_each_node :False
[2023-07-21 15:36:33,847] [ INFO] - save_steps :25
[2023-07-21 15:36:33,847] [ INFO] - save_strategy :IntervalStrategy.STEPS
[2023-07-21 15:36:33,847] [ INFO] - save_total_limit :1
[2023-07-21 15:36:33,847] [ INFO] - scale_loss :32768
[2023-07-21 15:36:33,847] [ INFO] - seed :42
[2023-07-21 15:36:33,847] [ INFO] - sharding :[]
[2023-07-21 15:36:33,847] [ INFO] - sharding_degree :-1
[2023-07-21 15:36:33,847] [ INFO] - should_log :True
[2023-07-21 15:36:33,847] [ INFO] - should_save :True
[2023-07-21 15:36:33,847] [ INFO] - skip_memory_metrics :True
[2023-07-21 15:36:33,847] [ INFO] - strategy :dynabert+ptq
[2023-07-21 15:36:33,847] [ INFO] - train_batch_size :16
[2023-07-21 15:36:33,847] [ INFO] - use_pact :True
[2023-07-21 15:36:33,847] [ INFO] - warmup_ratio :0.1
[2023-07-21 15:36:33,847] [ INFO] - warmup_steps :0
[2023-07-21 15:36:33,847] [ INFO] - weight_decay :0.0
[2023-07-21 15:36:33,847] [ INFO] - weight_quantize_type :channel_wise_abs_max
[2023-07-21 15:36:33,847] [ INFO] - width_mult_list :None
[2023-07-21 15:36:33,847] [ INFO] - world_size :1
[2023-07-21 15:36:33,847] [ INFO] -
[2023-07-21 15:36:33,849] [ INFO] - ***** Running training *****
[2023-07-21 15:36:33,849] [ INFO] - Num examples = 686
[2023-07-21 15:36:33,849] [ INFO] - Num Epochs = 5
[2023-07-21 15:36:33,849] [ INFO] - Instantaneous batch size per device = 16
[2023-07-21 15:36:33,849] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 16
[2023-07-21 15:36:33,849] [ INFO] - Gradient Accumulation steps = 1
[2023-07-21 15:36:33,849] [ INFO] - Total optimization steps = 215.0
[2023-07-21 15:36:33,849] [ INFO] - Total num train samples = 3430.0
[2023-07-21 15:36:33,856] [ INFO] - Number of trainable parameters = 281693122
[2023-07-21 15:36:55,804] [ INFO] - loss: 0.00139983, learning_rate: 1e-05, global_step: 5, interval_runtime: 21.9466, interval_samples_per_second: 3.645, interval_steps_per_second: 0.228, epoch: 0.1163
[2023-07-21 15:37:17,246] [ INFO] - loss: 0.00095238, learning_rate: 1e-05, global_step: 10, interval_runtime: 21.4431, interval_samples_per_second: 3.731, interval_steps_per_second: 0.233, epoch: 0.2326
[2023-07-21 15:37:38,397] [ INFO] - loss: 0.00227169, learning_rate: 1e-05, global_step: 15, interval_runtime: 21.1288, interval_samples_per_second: 3.786, interval_steps_per_second: 0.237, epoch: 0.3488
[2023-07-21 15:37:59,719] [ INFO] - loss: 0.00058537, learning_rate: 1e-05, global_step: 20, interval_runtime: 21.3431, interval_samples_per_second: 3.748, interval_steps_per_second: 0.234, epoch: 0.4651
[2023-07-21 15:38:20,879] [ INFO] - loss: 0.00099298, learning_rate: 1e-05, global_step: 25, interval_runtime: 21.1605, interval_samples_per_second: 3.781, interval_steps_per_second: 0.236, epoch: 0.5814
[2023-07-21 15:38:20,879] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:38:20,880] [ INFO] - Num examples = 35
[2023-07-21 15:38:20,880] [ INFO] - Total prediction steps = 3
[2023-07-21 15:38:20,880] [ INFO] - Pre device batch size = 16
[2023-07-21 15:38:20,880] [ INFO] - Total Batch size = 16
[2023-07-21 15:38:31,387] [ INFO] - eval_loss: 0.0014212249079719186, eval_precision: 0.9344262295081968, eval_recall: 0.9047619047619048, eval_f1: 0.9193548387096775, eval_runtime: 10.5013, eval_samples_per_second: 3.333, eval_steps_per_second: 0.286, epoch: 0.5814
[2023-07-21 15:38:31,387] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-25
[2023-07-21 15:38:31,390] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-25/config.json
[2023-07-21 15:38:33,536] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-25/tokenizer_config.json
[2023-07-21 15:38:33,537] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-25/special_tokens_map.json
[2023-07-21 15:38:46,593] [ INFO] - loss: 0.00054665, learning_rate: 1e-05, global_step: 30, interval_runtime: 25.7138, interval_samples_per_second: 3.111, interval_steps_per_second: 0.194, epoch: 0.6977
[2023-07-21 15:39:07,860] [ INFO] - loss: 0.00042223, learning_rate: 1e-05, global_step: 35, interval_runtime: 21.2605, interval_samples_per_second: 3.763, interval_steps_per_second: 0.235, epoch: 0.814
[2023-07-21 15:39:29,450] [ INFO] - loss: 0.00070746, learning_rate: 1e-05, global_step: 40, interval_runtime: 21.5964, interval_samples_per_second: 3.704, interval_steps_per_second: 0.232, epoch: 0.9302
[2023-07-21 15:39:50,745] [ INFO] - loss: 0.00027768, learning_rate: 1e-05, global_step: 45, interval_runtime: 21.2946, interval_samples_per_second: 3.757, interval_steps_per_second: 0.235, epoch: 1.0465
[2023-07-21 15:40:12,219] [ INFO] - loss: 0.00037302, learning_rate: 1e-05, global_step: 50, interval_runtime: 21.4753, interval_samples_per_second: 3.725, interval_steps_per_second: 0.233, epoch: 1.1628
[2023-07-21 15:40:12,220] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:40:12,220] [ INFO] - Num examples = 35
[2023-07-21 15:40:12,220] [ INFO] - Total prediction steps = 3
[2023-07-21 15:40:12,220] [ INFO] - Pre device batch size = 16
[2023-07-21 15:40:12,221] [ INFO] - Total Batch size = 16
[2023-07-21 15:40:22,304] [ INFO] - eval_loss: 0.0014475114876404405, eval_precision: 0.9482758620689655, eval_recall: 0.873015873015873, eval_f1: 0.9090909090909091, eval_runtime: 10.0828, eval_samples_per_second: 3.471, eval_steps_per_second: 0.298, epoch: 1.1628
[2023-07-21 15:40:22,305] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-50
[2023-07-21 15:40:22,308] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-50/config.json
[2023-07-21 15:40:24,464] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-50/tokenizer_config.json
[2023-07-21 15:40:24,465] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-50/special_tokens_map.json
[2023-07-21 15:40:37,740] [ INFO] - loss: 0.00019248, learning_rate: 1e-05, global_step: 55, interval_runtime: 25.5206, interval_samples_per_second: 3.135, interval_steps_per_second: 0.196, epoch: 1.2791
[2023-07-21 15:40:58,905] [ INFO] - loss: 0.00021258, learning_rate: 1e-05, global_step: 60, interval_runtime: 21.1645, interval_samples_per_second: 3.78, interval_steps_per_second: 0.236, epoch: 1.3953
[2023-07-21 15:41:20,213] [ INFO] - loss: 0.00024681, learning_rate: 1e-05, global_step: 65, interval_runtime: 21.3084, interval_samples_per_second: 3.754, interval_steps_per_second: 0.235, epoch: 1.5116
[2023-07-21 15:41:41,237] [ INFO] - loss: 0.000169, learning_rate: 1e-05, global_step: 70, interval_runtime: 21.024, interval_samples_per_second: 3.805, interval_steps_per_second: 0.238, epoch: 1.6279
[2023-07-21 15:42:02,163] [ INFO] - loss: 0.00036645, learning_rate: 1e-05, global_step: 75, interval_runtime: 20.9256, interval_samples_per_second: 3.823, interval_steps_per_second: 0.239, epoch: 1.7442
[2023-07-21 15:42:02,163] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:42:02,163] [ INFO] - Num examples = 35
[2023-07-21 15:42:02,164] [ INFO] - Total prediction steps = 3
[2023-07-21 15:42:02,164] [ INFO] - Pre device batch size = 16
[2023-07-21 15:42:02,164] [ INFO] - Total Batch size = 16
[2023-07-21 15:42:12,158] [ INFO] - eval_loss: 0.001322056632488966, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 9.9708, eval_samples_per_second: 3.51, eval_steps_per_second: 0.301, epoch: 1.7442
[2023-07-21 15:42:12,159] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-75
[2023-07-21 15:42:12,161] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-75/config.json
[2023-07-21 15:42:14,264] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-75/tokenizer_config.json
[2023-07-21 15:42:14,264] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-75/special_tokens_map.json
[2023-07-21 15:42:18,485] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-25] due to args.save_total_limit
[2023-07-21 15:42:27,793] [ INFO] - loss: 0.00060927, learning_rate: 1e-05, global_step: 80, interval_runtime: 25.6304, interval_samples_per_second: 3.121, interval_steps_per_second: 0.195, epoch: 1.8605
[2023-07-21 15:42:48,729] [ INFO] - loss: 0.00068383, learning_rate: 1e-05, global_step: 85, interval_runtime: 20.9361, interval_samples_per_second: 3.821, interval_steps_per_second: 0.239, epoch: 1.9767
[2023-07-21 15:43:09,835] [ INFO] - loss: 0.00042777, learning_rate: 1e-05, global_step: 90, interval_runtime: 21.1056, interval_samples_per_second: 3.79, interval_steps_per_second: 0.237, epoch: 2.093
[2023-07-21 15:43:30,942] [ INFO] - loss: 0.00013877, learning_rate: 1e-05, global_step: 95, interval_runtime: 21.1075, interval_samples_per_second: 3.79, interval_steps_per_second: 0.237, epoch: 2.2093
[2023-07-21 15:43:52,187] [ INFO] - loss: 0.00042886, learning_rate: 1e-05, global_step: 100, interval_runtime: 21.2446, interval_samples_per_second: 3.766, interval_steps_per_second: 0.235, epoch: 2.3256
[2023-07-21 15:43:52,188] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:43:52,188] [ INFO] - Num examples = 35
[2023-07-21 15:43:52,188] [ INFO] - Total prediction steps = 3
[2023-07-21 15:43:52,188] [ INFO] - Pre device batch size = 16
[2023-07-21 15:43:52,188] [ INFO] - Total Batch size = 16
[2023-07-21 15:44:02,369] [ INFO] - eval_loss: 0.001290834159590304, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.1799, eval_samples_per_second: 3.438, eval_steps_per_second: 0.295, epoch: 2.3256
[2023-07-21 15:44:02,369] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-100
[2023-07-21 15:44:02,371] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-100/config.json
[2023-07-21 15:44:04,511] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-100/tokenizer_config.json
[2023-07-21 15:44:04,511] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-100/special_tokens_map.json
[2023-07-21 15:44:08,763] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-50] due to args.save_total_limit
[2023-07-21 15:44:17,868] [ INFO] - loss: 0.00011366, learning_rate: 1e-05, global_step: 105, interval_runtime: 25.6806, interval_samples_per_second: 3.115, interval_steps_per_second: 0.195, epoch: 2.4419
[2023-07-21 15:44:39,049] [ INFO] - loss: 4.777e-05, learning_rate: 1e-05, global_step: 110, interval_runtime: 21.1812, interval_samples_per_second: 3.777, interval_steps_per_second: 0.236, epoch: 2.5581
[2023-07-21 15:45:00,245] [ INFO] - loss: 0.00013845, learning_rate: 1e-05, global_step: 115, interval_runtime: 21.1969, interval_samples_per_second: 3.774, interval_steps_per_second: 0.236, epoch: 2.6744
[2023-07-21 15:45:21,118] [ INFO] - loss: 0.00040561, learning_rate: 1e-05, global_step: 120, interval_runtime: 20.8727, interval_samples_per_second: 3.833, interval_steps_per_second: 0.24, epoch: 2.7907
[2023-07-21 15:45:41,985] [ INFO] - loss: 0.00054928, learning_rate: 1e-05, global_step: 125, interval_runtime: 20.8671, interval_samples_per_second: 3.834, interval_steps_per_second: 0.24, epoch: 2.907
[2023-07-21 15:45:41,986] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:45:41,986] [ INFO] - Num examples = 35
[2023-07-21 15:45:41,986] [ INFO] - Total prediction steps = 3
[2023-07-21 15:45:41,986] [ INFO] - Pre device batch size = 16
[2023-07-21 15:45:41,986] [ INFO] - Total Batch size = 16
[2023-07-21 15:45:52,179] [ INFO] - eval_loss: 0.0013684021541848779, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.1923, eval_samples_per_second: 3.434, eval_steps_per_second: 0.294, epoch: 2.907
[2023-07-21 15:45:52,180] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-125
[2023-07-21 15:45:52,182] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-125/config.json
[2023-07-21 15:45:54,324] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-125/tokenizer_config.json
[2023-07-21 15:45:54,324] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-125/special_tokens_map.json
[2023-07-21 15:45:58,570] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-100] due to args.save_total_limit
[2023-07-21 15:46:07,445] [ INFO] - loss: 5.219e-05, learning_rate: 1e-05, global_step: 130, interval_runtime: 25.4597, interval_samples_per_second: 3.142, interval_steps_per_second: 0.196, epoch: 3.0233
[2023-07-21 15:46:28,712] [ INFO] - loss: 0.00026077, learning_rate: 1e-05, global_step: 135, interval_runtime: 21.2671, interval_samples_per_second: 3.762, interval_steps_per_second: 0.235, epoch: 3.1395
[2023-07-21 15:46:49,731] [ INFO] - loss: 6.99e-05, learning_rate: 1e-05, global_step: 140, interval_runtime: 21.0185, interval_samples_per_second: 3.806, interval_steps_per_second: 0.238, epoch: 3.2558
[2023-07-21 15:47:10,751] [ INFO] - loss: 0.00023049, learning_rate: 1e-05, global_step: 145, interval_runtime: 21.0205, interval_samples_per_second: 3.806, interval_steps_per_second: 0.238, epoch: 3.3721
[2023-07-21 15:47:31,889] [ INFO] - loss: 0.00015275, learning_rate: 1e-05, global_step: 150, interval_runtime: 21.1372, interval_samples_per_second: 3.785, interval_steps_per_second: 0.237, epoch: 3.4884
[2023-07-21 15:47:31,889] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:47:31,889] [ INFO] - Num examples = 35
[2023-07-21 15:47:31,889] [ INFO] - Total prediction steps = 3
[2023-07-21 15:47:31,890] [ INFO] - Pre device batch size = 16
[2023-07-21 15:47:31,890] [ INFO] - Total Batch size = 16
[2023-07-21 15:47:42,271] [ INFO] - eval_loss: 0.0013476903550326824, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.3813, eval_samples_per_second: 3.371, eval_steps_per_second: 0.289, epoch: 3.4884
[2023-07-21 15:47:42,272] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-150
[2023-07-21 15:47:42,274] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-150/config.json
[2023-07-21 15:47:44,424] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-150/tokenizer_config.json
[2023-07-21 15:47:44,424] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-150/special_tokens_map.json
[2023-07-21 15:47:48,728] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-125] due to args.save_total_limit
[2023-07-21 15:47:57,472] [ INFO] - loss: 0.00024907, learning_rate: 1e-05, global_step: 155, interval_runtime: 25.5832, interval_samples_per_second: 3.127, interval_steps_per_second: 0.195, epoch: 3.6047
[2023-07-21 15:48:18,254] [ INFO] - loss: 0.00027028, learning_rate: 1e-05, global_step: 160, interval_runtime: 20.7824, interval_samples_per_second: 3.849, interval_steps_per_second: 0.241, epoch: 3.7209
[2023-07-21 15:48:39,309] [ INFO] - loss: 0.0001771, learning_rate: 1e-05, global_step: 165, interval_runtime: 21.0551, interval_samples_per_second: 3.8, interval_steps_per_second: 0.237, epoch: 3.8372
[2023-07-21 15:49:00,354] [ INFO] - loss: 0.00024041, learning_rate: 1e-05, global_step: 170, interval_runtime: 21.0449, interval_samples_per_second: 3.801, interval_steps_per_second: 0.238, epoch: 3.9535
[2023-07-21 15:49:21,382] [ INFO] - loss: 4.51e-05, learning_rate: 1e-05, global_step: 175, interval_runtime: 21.0273, interval_samples_per_second: 3.805, interval_steps_per_second: 0.238, epoch: 4.0698
[2023-07-21 15:49:21,382] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:49:21,382] [ INFO] - Num examples = 35
[2023-07-21 15:49:21,382] [ INFO] - Total prediction steps = 3
[2023-07-21 15:49:21,382] [ INFO] - Pre device batch size = 16
[2023-07-21 15:49:21,382] [ INFO] - Total Batch size = 16
[2023-07-21 15:49:31,953] [ INFO] - eval_loss: 0.0013263615546748042, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.57, eval_samples_per_second: 3.311, eval_steps_per_second: 0.284, epoch: 4.0698
[2023-07-21 15:49:31,954] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-175
[2023-07-21 15:49:31,956] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-175/config.json
[2023-07-21 15:49:34,699] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-175/tokenizer_config.json
[2023-07-21 15:49:34,700] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-175/special_tokens_map.json
[2023-07-21 15:49:40,286] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-150] due to args.save_total_limit
[2023-07-21 15:49:48,671] [ INFO] - loss: 0.0003263, learning_rate: 1e-05, global_step: 180, interval_runtime: 27.2898, interval_samples_per_second: 2.931, interval_steps_per_second: 0.183, epoch: 4.186
[2023-07-21 15:50:09,486] [ INFO] - loss: 0.00014406, learning_rate: 1e-05, global_step: 185, interval_runtime: 20.8144, interval_samples_per_second: 3.843, interval_steps_per_second: 0.24, epoch: 4.3023
[2023-07-21 15:50:31,097] [ INFO] - loss: 0.00010923, learning_rate: 1e-05, global_step: 190, interval_runtime: 21.6107, interval_samples_per_second: 3.702, interval_steps_per_second: 0.231, epoch: 4.4186
[2023-07-21 15:50:52,282] [ INFO] - loss: 8.216e-05, learning_rate: 1e-05, global_step: 195, interval_runtime: 21.1856, interval_samples_per_second: 3.776, interval_steps_per_second: 0.236, epoch: 4.5349
[2023-07-21 15:51:14,299] [ INFO] - loss: 9.251e-05, learning_rate: 1e-05, global_step: 200, interval_runtime: 22.0164, interval_samples_per_second: 3.634, interval_steps_per_second: 0.227, epoch: 4.6512
[2023-07-21 15:51:14,299] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:51:14,299] [ INFO] - Num examples = 35
[2023-07-21 15:51:14,299] [ INFO] - Total prediction steps = 3
[2023-07-21 15:51:14,299] [ INFO] - Pre device batch size = 16
[2023-07-21 15:51:14,300] [ INFO] - Total Batch size = 16
[2023-07-21 15:51:24,773] [ INFO] - eval_loss: 0.0014609990175813437, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.4732, eval_samples_per_second: 3.342, eval_steps_per_second: 0.286, epoch: 4.6512
[2023-07-21 15:51:24,774] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-200
[2023-07-21 15:51:24,776] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-200/config.json
[2023-07-21 15:51:27,228] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-200/tokenizer_config.json
[2023-07-21 15:51:27,228] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-200/special_tokens_map.json
[2023-07-21 15:51:32,347] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-175] due to args.save_total_limit
[2023-07-21 15:51:41,379] [ INFO] - loss: 0.00016781, learning_rate: 1e-05, global_step: 205, interval_runtime: 27.0808, interval_samples_per_second: 2.954, interval_steps_per_second: 0.185, epoch: 4.7674
[2023-07-21 15:52:03,510] [ INFO] - loss: 0.00013611, learning_rate: 1e-05, global_step: 210, interval_runtime: 22.1302, interval_samples_per_second: 3.615, interval_steps_per_second: 0.226, epoch: 4.8837
[2023-07-21 15:52:23,996] [ INFO] - loss: 0.0001641, learning_rate: 1e-05, global_step: 215, interval_runtime: 20.4867, interval_samples_per_second: 3.905, interval_steps_per_second: 0.244, epoch: 5.0
[2023-07-21 15:52:23,997] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:52:23,997] [ INFO] - Num examples = 35
[2023-07-21 15:52:23,997] [ INFO] - Total prediction steps = 3
[2023-07-21 15:52:23,997] [ INFO] - Pre device batch size = 16
[2023-07-21 15:52:23,997] [ INFO] - Total Batch size = 16
[2023-07-21 15:52:33,805] [ INFO] - eval_loss: 0.0011874400079250336, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 9.8078, eval_samples_per_second: 3.569, eval_steps_per_second: 0.306, epoch: 5.0
[2023-07-21 15:52:33,806] [ INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-215
[2023-07-21 15:52:33,808] [ INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-215/config.json
[2023-07-21 15:52:36,141] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-215/tokenizer_config.json
[2023-07-21 15:52:36,141] [ INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-215/special_tokens_map.json
[2023-07-21 15:52:41,717] [ INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-200] due to args.save_total_limit
[2023-07-21 15:52:42,252] [ INFO] -
Training completed.
[2023-07-21 15:52:42,252] [ INFO] - Loading best model from ./checkpoint/model_best/checkpoint-75 (score: 0.9354838709677418).
[2023-07-21 15:52:43,847] [ INFO] - train_runtime: 969.9908, train_samples_per_second: 3.536, train_steps_per_second: 0.222, train_loss: 0.0003774468271267535, epoch: 5.0
[2023-07-21 15:52:43,915] [ INFO] - Saving model checkpoint to ./checkpoint/model_best
[2023-07-21 15:52:43,917] [ INFO] - Configuration saved in ./checkpoint/model_best/config.json
[2023-07-21 15:52:46,306] [ INFO] - tokenizer config file saved in ./checkpoint/model_best/tokenizer_config.json
[2023-07-21 15:52:46,306] [ INFO] - Special tokens file saved in ./checkpoint/model_best/special_tokens_map.json
[2023-07-21 15:52:46,314] [ INFO] - ***** train metrics *****
[2023-07-21 15:52:46,315] [ INFO] - epoch = 5.0
[2023-07-21 15:52:46,315] [ INFO] - train_loss = 0.0004
[2023-07-21 15:52:46,315] [ INFO] - train_runtime = 0:16:09.99
[2023-07-21 15:52:46,315] [ INFO] - train_samples_per_second = 3.536
[2023-07-21 15:52:46,315] [ INFO] - train_steps_per_second = 0.222
[2023-07-21 15:52:46,318] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:52:46,318] [ INFO] - Num examples = 35
[2023-07-21 15:52:46,318] [ INFO] - Total prediction steps = 3
[2023-07-21 15:52:46,318] [ INFO] - Pre device batch size = 16
[2023-07-21 15:52:46,318] [ INFO] - Total Batch size = 16
[2023-07-21 15:52:55,755] [ INFO] - eval_loss: 0.001322056632488966, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 9.4374, eval_samples_per_second: 3.709, eval_steps_per_second: 0.318, epoch: 5.0
[2023-07-21 15:52:55,756] [ INFO] - ***** eval metrics *****
[2023-07-21 15:52:55,756] [ INFO] - epoch = 5.0
[2023-07-21 15:52:55,756] [ INFO] - eval_f1 = 0.9355
[2023-07-21 15:52:55,756] [ INFO] - eval_loss = 0.0013
[2023-07-21 15:52:55,756] [ INFO] - eval_precision = 0.9508
[2023-07-21 15:52:55,756] [ INFO] - eval_recall = 0.9206
[2023-07-21 15:52:55,756] [ INFO] - eval_runtime = 0:00:09.43
[2023-07-21 15:52:55,756] [ INFO] - eval_samples_per_second = 3.709
[2023-07-21 15:52:55,756] [ INFO] - eval_steps_per_second = 0.318
[2023-07-21 15:52:55,759] [ INFO] - Exporting inference model to ./checkpoint/model_best/model
[2023-07-21 15:53:55,567] [ INFO] - Inference model exported.
!python evaluate.py \
--device "gpu" \
--model_path ./checkpoint/model_best \
--test_path ./medical_checklist/dev.txt \
--output_dir ./checkpoint/model_best \
--label_names 'start_positions' 'end_positions'\
--max_seq_len 512 \
--per_device_eval_batch_size 16
[2023-07-21 15:55:25,012] [ INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2023-07-21 15:55:25,012] [ INFO] - ============================================================
[2023-07-21 15:55:25,013] [ INFO] - Model Configuration Arguments
[2023-07-21 15:55:25,013] [ INFO] - paddle commit id :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:55:25,013] [ INFO] - model_path :./checkpoint/model_best
[2023-07-21 15:55:25,013] [ INFO] -
[2023-07-21 15:55:25,013] [ INFO] - ============================================================
[2023-07-21 15:55:25,013] [ INFO] - Data Configuration Arguments
[2023-07-21 15:55:25,013] [ INFO] - paddle commit id :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:55:25,013] [ INFO] - debug :False
[2023-07-21 15:55:25,013] [ INFO] - max_seq_len :512
[2023-07-21 15:55:25,013] [ INFO] - schema_lang :ch
[2023-07-21 15:55:25,013] [ INFO] - test_path :./medical_checklist/dev.txt
[2023-07-21 15:55:25,013] [ INFO] -
[2023-07-21 15:55:25,014] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load './checkpoint/model_best'.
[2023-07-21 15:55:25,693] [ INFO] - loading configuration file ./checkpoint/model_best/config.json
[2023-07-21 15:55:25,694] [ INFO] - Model config ErnieLayoutConfig {
"architectures": [
"UIEX"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"coordinate_size": 128,
"dtype": "float32",
"enable_recompute": false,
"eos_token_id": 2,
"fuse": false,
"gradient_checkpointing": false,
"has_relative_attention_bias": true,
"has_spatial_attention_bias": true,
"has_visual_segment_embedding": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"image_feature_pool_shape": [
7,
7,
256
],
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_2d_position_embeddings": 1024,
"max_position_embeddings": 514,
"max_rel_2d_pos": 256,
"max_rel_pos": 128,
"model_type": "ernie_layout",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 1,
"paddlenlp_version": null,
"pool_act": "tanh",
"rel_2d_pos_bins": 64,
"rel_pos_bins": 32,
"shape_size": 128,
"task_id": 0,
"task_type_vocab_size": 3,
"type_vocab_size": 100,
"use_task_id": true,
"vocab_size": 250002
}
W0721 15:55:29.126700 3399 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0721 15:55:29.130168 3399 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[2023-07-21 15:55:31,058] [ INFO] - All model checkpoint weights were used when initializing UIEX.
[2023-07-21 15:55:31,058] [ INFO] - All the weights of UIEX were initialized from the model checkpoint at ./checkpoint/model_best.
If your task is similar to the task the model of the checkpoint was trained on, you can already use UIEX for predictions without further training.
[2023-07-21 15:55:31,259] [ INFO] - ============================================================
[2023-07-21 15:55:31,259] [ INFO] - Training Configuration Arguments
[2023-07-21 15:55:31,259] [ INFO] - paddle commit id :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:55:31,260] [ INFO] - _no_sync_in_gradient_accumulation:True
[2023-07-21 15:55:31,260] [ INFO] - adam_beta1 :0.9
[2023-07-21 15:55:31,260] [ INFO] - adam_beta2 :0.999
[2023-07-21 15:55:31,260] [ INFO] - adam_epsilon :1e-08
[2023-07-21 15:55:31,260] [ INFO] - bf16 :False
[2023-07-21 15:55:31,260] [ INFO] - bf16_full_eval :False
[2023-07-21 15:55:31,260] [ INFO] - current_device :gpu:0
[2023-07-21 15:55:31,260] [ INFO] - dataloader_drop_last :False
[2023-07-21 15:55:31,260] [ INFO] - dataloader_num_workers :0
[2023-07-21 15:55:31,260] [ INFO] - device :gpu
[2023-07-21 15:55:31,260] [ INFO] - disable_tqdm :False
[2023-07-21 15:55:31,260] [ INFO] - do_eval :False
[2023-07-21 15:55:31,260] [ INFO] - do_export :False
[2023-07-21 15:55:31,260] [ INFO] - do_predict :False
[2023-07-21 15:55:31,260] [ INFO] - do_train :False
[2023-07-21 15:55:31,260] [ INFO] - eval_batch_size :16
[2023-07-21 15:55:31,261] [ INFO] - eval_steps :None
[2023-07-21 15:55:31,261] [ INFO] - evaluation_strategy :IntervalStrategy.NO
[2023-07-21 15:55:31,261] [ INFO] - flatten_param_grads :False
[2023-07-21 15:55:31,261] [ INFO] - fp16 :False
[2023-07-21 15:55:31,261] [ INFO] - fp16_full_eval :False
[2023-07-21 15:55:31,261] [ INFO] - fp16_opt_level :O1
[2023-07-21 15:55:31,261] [ INFO] - gradient_accumulation_steps :1
[2023-07-21 15:55:31,261] [ INFO] - greater_is_better :None
[2023-07-21 15:55:31,261] [ INFO] - ignore_data_skip :False
[2023-07-21 15:55:31,261] [ INFO] - label_names :['start_positions', 'end_positions']
[2023-07-21 15:55:31,261] [ INFO] - lazy_data_processing :True
[2023-07-21 15:55:31,261] [ INFO] - learning_rate :5e-05
[2023-07-21 15:55:31,261] [ INFO] - load_best_model_at_end :False
[2023-07-21 15:55:31,261] [ INFO] - local_process_index :0
[2023-07-21 15:55:31,261] [ INFO] - local_rank :-1
[2023-07-21 15:55:31,261] [ INFO] - log_level :-1
[2023-07-21 15:55:31,261] [ INFO] - log_level_replica :-1
[2023-07-21 15:55:31,261] [ INFO] - log_on_each_node :True
[2023-07-21 15:55:31,261] [ INFO] - logging_dir :./checkpoint/model_best/runs/Jul21_15-55-25_jupyter-2631487-6518069
[2023-07-21 15:55:31,262] [ INFO] - logging_first_step :False
[2023-07-21 15:55:31,262] [ INFO] - logging_steps :500
[2023-07-21 15:55:31,262] [ INFO] - logging_strategy :IntervalStrategy.STEPS
[2023-07-21 15:55:31,262] [ INFO] - lr_scheduler_type :SchedulerType.LINEAR
[2023-07-21 15:55:31,262] [ INFO] - max_grad_norm :1.0
[2023-07-21 15:55:31,262] [ INFO] - max_steps :-1
[2023-07-21 15:55:31,262] [ INFO] - metric_for_best_model :None
[2023-07-21 15:55:31,262] [ INFO] - minimum_eval_times :None
[2023-07-21 15:55:31,262] [ INFO] - no_cuda :False
[2023-07-21 15:55:31,262] [ INFO] - num_train_epochs :3.0
[2023-07-21 15:55:31,262] [ INFO] - optim :OptimizerNames.ADAMW
[2023-07-21 15:55:31,262] [ INFO] - output_dir :./checkpoint/model_best
[2023-07-21 15:55:31,262] [ INFO] - overwrite_output_dir :False
[2023-07-21 15:55:31,262] [ INFO] - past_index :-1
[2023-07-21 15:55:31,262] [ INFO] - per_device_eval_batch_size :16
[2023-07-21 15:55:31,262] [ INFO] - per_device_train_batch_size :8
[2023-07-21 15:55:31,262] [ INFO] - prediction_loss_only :False
[2023-07-21 15:55:31,262] [ INFO] - process_index :0
[2023-07-21 15:55:31,262] [ INFO] - recompute :False
[2023-07-21 15:55:31,262] [ INFO] - remove_unused_columns :True
[2023-07-21 15:55:31,262] [ INFO] - report_to :['visualdl']
[2023-07-21 15:55:31,262] [ INFO] - resume_from_checkpoint :None
[2023-07-21 15:55:31,262] [ INFO] - run_name :./checkpoint/model_best
[2023-07-21 15:55:31,262] [ INFO] - save_on_each_node :False
[2023-07-21 15:55:31,262] [ INFO] - save_steps :500
[2023-07-21 15:55:31,263] [ INFO] - save_strategy :IntervalStrategy.STEPS
[2023-07-21 15:55:31,263] [ INFO] - save_total_limit :None
[2023-07-21 15:55:31,263] [ INFO] - scale_loss :32768
[2023-07-21 15:55:31,263] [ INFO] - seed :42
[2023-07-21 15:55:31,263] [ INFO] - sharding :[]
[2023-07-21 15:55:31,263] [ INFO] - sharding_degree :-1
[2023-07-21 15:55:31,263] [ INFO] - should_log :True
[2023-07-21 15:55:31,263] [ INFO] - should_save :True
[2023-07-21 15:55:31,263] [ INFO] - skip_memory_metrics :True
[2023-07-21 15:55:31,263] [ INFO] - train_batch_size :8
[2023-07-21 15:55:31,263] [ INFO] - warmup_ratio :0.0
[2023-07-21 15:55:31,263] [ INFO] - warmup_steps :0
[2023-07-21 15:55:31,263] [ INFO] - weight_decay :0.0
[2023-07-21 15:55:31,263] [ INFO] - world_size :1
[2023-07-21 15:55:31,263] [ INFO] -
[2023-07-21 15:55:31,263] [ INFO] - ***** Running Evaluation *****
[2023-07-21 15:55:31,263] [ INFO] - Num examples = 35
[2023-07-21 15:55:31,263] [ INFO] - Total prediction steps = 3
[2023-07-21 15:55:31,263] [ INFO] - Pre device batch size = 16
[2023-07-21 15:55:31,264] [ INFO] - Total Batch size = 16
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00, 1.31s/it]
[2023-07-21 15:55:41,222] [ INFO] - -----Evaluate model-------
[2023-07-21 15:55:41,222] [ INFO] - Class Name: ALL CLASSES
[2023-07-21 15:55:41,222] [ INFO] - Evaluation Precision: 0.95082 | Recall: 0.92063 | F1: 0.93548
[2023-07-21 15:55:41,222] [ INFO] - -----------------------------
from pprint import pprint
from paddlenlp import Taskflow
schema = {
'项目名称': [
'结果',
'单位',
'参考范围'
]
}
my_ie = Taskflow("information_extraction", model="uie-x-base", schema=schema, task_path='./checkpoint/model_best')
pprint(my_ie({"doc": "test.jpg"}))
[{'项目名称': [{'bbox': [[417, 598, 764, 653]],
'end': 161,
'probability': 0.9931185709767476,
'relations': {'单位': [{'bbox': [[1383, 603, 1475, 653]],
'end': 170,
'probability': 0.9982062669088805,
'start': 166,
'text': 'ng/L'}],
'参考范围': [{'bbox': [[1603, 603, 1717, 650]],
'end': 175,
'probability': 0.994915152253455,
'start': 170,
'text': '0-0.2'}],
'结果': [{'bbox': [[1055, 608, 1161, 647]],
'end': 166,
'probability': 0.9779773840612904,
'start': 161,
'text': '0.000'}]},
'start': 150,
'text': '乙肝表面抗原HBsAg'},
{'bbox': [[420, 803, 807, 850]],
'end': 263,
'probability': 0.9839514684545492,
'relations': {'单位': [{'bbox': [[1382, 800, 1481, 856]],
'end': 272,
'probability': 0.9902134016753692,
'start': 268,
'text': 'U/mL'}],
'参考范围': [{'bbox': [[1609, 806, 1717, 845]],
'end': 277,
'probability': 0.9948578061238109,
'start': 272,
'text': '0-0.2'}],
'结果': [{'bbox': [[1055, 806, 1163, 853]],
'end': 268,
'probability': 0.9997722031372689,
'start': 263,
'text': '0.081'}]},
'start': 248,
'text': '乙肝e抗体Anti-HBeAB'},
{'bbox': [[417, 671, 863, 718]],
'end': 197,
'probability': 0.9933030680080606,
'relations': {'单位': [{'bbox': [[1383, 671, 1512, 717]],
'end': 208,
'probability': 0.993252639775573,
'start': 202,
'text': 'MIU/mL'}],
'参考范围': [{'bbox': [[1603, 671, 1697, 717]],
'end': 212,
'probability': 0.9968451209051636,
'start': 208,
'text': '0-10'}],
'结果': [{'bbox': [[1055, 676, 1163, 715]],
'end': 202,
'probability': 0.9627551951018489,
'start': 197,
'text': '0.000'}]},
'start': 181,
'text': '乙肝表面抗体Anti-HBsAB'},
{'bbox': [[420, 735, 706, 785]],
'end': 228,
'probability': 0.9925530039269148,
'relations': {'单位': [{'bbox': [[1383, 738, 1475, 785]],
'end': 237,
'probability': 0.9953925121749307,
'start': 233,
'text': 'U/mL'}],
'参考范围': [{'bbox': [[1606, 741, 1715, 780]],
'end': 242,
'probability': 0.9982005347972311,
'start': 237,
'text': '0-0.5'}],
'结果': [{'bbox': [[1057, 743, 1163, 782]],
'end': 233,
'probability': 0.9943726871306069,
'start': 228,
'text': '0.000'}]},
'start': 218,
'text': '乙肝e抗原HBeAg'},
{'bbox': [[420, 871, 870, 918]],
'end': 299,
'probability': 0.9931226228703274,
'relations': {'单位': [{'bbox': [[1389, 871, 1477, 918]],
'end': 308,
'probability': 0.9990609045893919,
'start': 304,
'text': 'U/mL'}],
'参考范围': [{'bbox': [[1611, 873, 1717, 912]],
'end': 313,
'probability': 0.9937555165322465,
'start': 308,
'text': '0-0.9'}],
'结果': [{'bbox': [[1054, 867, 1169, 921]],
'end': 304,
'probability': 0.9996564084931308,
'start': 299,
'text': '1.053'}]},
'start': 283,
'text': '乙肝核心抗体Anti-HBcAB'},
{'bbox': [[415, 536, 794, 580]],
'end': 130,
'probability': 0.9905078246100985,
'relations': {'单位': [{'bbox': [[1383, 536, 1475, 585]],
'end': 139,
'probability': 0.9996564019316949,
'start': 135,
'text': 's/co'}],
'参考范围': [{'bbox': [[1603, 533, 1745, 588]],
'end': 144,
'probability': 0.9937541085628041,
'start': 139,
'text': '阴性(-)'}],
'结果': [{'bbox': [[1055, 536, 1194, 582]],
'end': 135,
'probability': 0.9912728416351548,
'start': 130,
'text': '阴性(-)'}]},
'start': 118,
'text': '乙肝病毒前S1抗原HBV'}]}]
图像展示
import matplotlib.pyplot as plt
from paddlenlp.utils.doc_parser import DocParser
results = my_ie({"doc": "test.jpg"})
img_show = DocParser.write_image_with_results(
"test.jpg",
result=results[0],
return_image=True)
plt.figure(figsize=(15,15))
plt.imshow(img_show)
plt.show()
项目地址:https://aistudio.baidu.com/aistudio/projectdetail/6518069?sUid=2631487&shared=1&ts=1690163802670