| 2020-09-28 13:11:05,720 loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/tiger/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 | |
| 2020-09-28 13:11:06,354 Loading pretrained backbone weights from hdfs:///home/byte_arnold_hl_vc/user/chenjiacheng/data/coco/original_updown/original_updown_backbone.pth for backbone source vsepp_detector | |
| 2020-09-28 13:11:17,472 Resnet backbone now has fixed blocks 2 | |
| 2020-09-28 13:11:18,889 loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at /home/tiger/.cache/torch/transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517 | |
| 2020-09-28 13:11:18,889 Model config { | |
| "architectures": [ | |
| "BertForMaskedLM" | |
| ], | |
| "attention_probs_dropout_prob": 0.1, | |
| "finetuning_task": null, | |
| "hidden_act": "gelu", | |
| "hidden_dropout_prob": 0.1, | |
| "hidden_size": 768, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 3072, | |
| "layer_norm_eps": 1e-12, | |
| "max_position_embeddings": 512, | |
| "model_type": "bert", | |
| "num_attention_heads": 12, | |
| "num_hidden_layers": 12, | |
| "num_labels": 2, | |
| "output_attentions": false, | |
| "output_hidden_states": false, | |
| "pad_token_id": 0, | |
| "pruned_heads": {}, | |
| "torchscript": false, | |
| "type_vocab_size": 2, | |
| "use_bfloat16": false, | |
| "vocab_size": 30522 | |
| } | |
| 2020-09-28 13:11:20,205 loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at /home/tiger/.cache/torch/transformers/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157 | |
| 2020-09-28 13:11:22,425 Use adam as the optimizer, with init lr 0.0005 | |
| 2020-09-28 13:11:22,426 Image encoder is data paralleled now. | |
| 2020-09-28 13:11:22,566 Load full model with backbone | |
| 2020-09-28 13:11:22,568 Loading dataset | |
| 2020-09-28 13:11:27,156 Input mode small: scaled by factor 2.0 | |
| 2020-09-28 13:11:43,512 Computing results... | |
| 2020-09-28 13:12:40,848 Test: [0/40] Le 64.1531 (64.1531) Time 57.332 (0.000) | |
| 2020-09-28 13:12:46,101 Test: [10/40] Le 63.1351 (61.8695) Time 0.564 (0.000) | |
| 2020-09-28 13:12:51,829 Test: [20/40] Le 60.7486 (61.3324) Time 0.591 (0.000) | |
| 2020-09-28 13:12:57,364 Test: [30/40] Le 60.9106 (61.2955) Time 0.576 (0.000) | |
| 2020-09-28 13:13:03,463 Images: 1000, Captions: 5000 | |
| 2020-09-28 13:13:05,111 rsum: 522.3 | |
| 2020-09-28 13:13:05,111 Average i2t Recall: 92.4 | |
| 2020-09-28 13:13:05,111 Image to text: 81.5 97.1 98.5 1.0 2.0 | |
| 2020-09-28 13:13:05,111 Average t2i Recall: 81.7 | |
| 2020-09-28 13:13:05,111 Text to image: 63.7 88.3 93.2 1.0 5.6 | |