ModernBERT-base fine-tuned on GooAQ

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 256 tokens
  • Number of Output Labels: 1 label
  • Language: en
  • License: apache-2.0

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the ๐Ÿค— Hub
model = CrossEncoder("bnkc123/modernbert-base-gooaq-bce")
# Get scores for pairs of texts
pairs = [
    ['how many blocks can a bag of cement mould in nigeria?', '1 bag of cement produces 50 blocks so let us do the calculation,15 bags will produce 15 x 50 = 750 pieces of 6 inches blocks. So with a double tipper of sand and 15 bags of cement you will get 750 blocks.'],
    ['how many blocks can a bag of cement mould in nigeria?', 'Wood, cement, aggregates, metals, bricks, concrete, clay are the most common type of building material used in construction. The choice of these are based on their cost effectiveness for building projects.'],
    ['how many blocks can a bag of cement mould in nigeria?', 'x 16 in. Concrete Blocks are a great choice for the construction of your next masonry project. Concrete Block construction provides durability,fire resistance and thermal mass which adds to energy efficiency. Concrete block also provide high resistance to sound penetration.'],
    ['how many blocks can a bag of cement mould in nigeria?', "['Cement or lime concrete.', 'Bricks.', 'Flagstones.', 'Marble.', 'Glass.', 'Ceramic.', 'Plastic.', 'Mud and murram.']"],
    ['how many blocks can a bag of cement mould in nigeria?', "Here's the thing: Even though reusable bags are multi-use, and often made of recycled fabric, they are usually not recyclable."],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'how many blocks can a bag of cement mould in nigeria?',
    [
        '1 bag of cement produces 50 blocks so let us do the calculation,15 bags will produce 15 x 50 = 750 pieces of 6 inches blocks. So with a double tipper of sand and 15 bags of cement you will get 750 blocks.',
        'Wood, cement, aggregates, metals, bricks, concrete, clay are the most common type of building material used in construction. The choice of these are based on their cost effectiveness for building projects.',
        'x 16 in. Concrete Blocks are a great choice for the construction of your next masonry project. Concrete Block construction provides durability,fire resistance and thermal mass which adds to energy efficiency. Concrete block also provide high resistance to sound penetration.',
        "['Cement or lime concrete.', 'Bricks.', 'Flagstones.', 'Marble.', 'Glass.', 'Ceramic.', 'Plastic.', 'Mud and murram.']",
        "Here's the thing: Even though reusable bags are multi-use, and often made of recycled fabric, they are usually not recyclable.",
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric Value
map 0.8711 (+0.0572)
mrr@10 0.8702 (+0.0576)
ndcg@10 0.8945 (+0.0451)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 54,000 training samples
  • Columns: question, answer, and label
  • Approximate statistics based on the first 1000 samples:
    question answer label
    type string string int
    details
    • min: 16 characters
    • mean: 42.45 characters
    • max: 86 characters
    • min: 53 characters
    • mean: 254.92 characters
    • max: 393 characters
    • 0: ~83.30%
    • 1: ~16.70%
  • Samples:
    question answer label
    how many blocks can a bag of cement mould in nigeria? 1 bag of cement produces 50 blocks so let us do the calculation,15 bags will produce 15 x 50 = 750 pieces of 6 inches blocks. So with a double tipper of sand and 15 bags of cement you will get 750 blocks. 1
    how many blocks can a bag of cement mould in nigeria? Wood, cement, aggregates, metals, bricks, concrete, clay are the most common type of building material used in construction. The choice of these are based on their cost effectiveness for building projects. 0
    how many blocks can a bag of cement mould in nigeria? x 16 in. Concrete Blocks are a great choice for the construction of your next masonry project. Concrete Block construction provides durability,fire resistance and thermal mass which adds to energy efficiency. Concrete block also provide high resistance to sound penetration. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": 3
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 8
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss gooaq-dev_ndcg@10
-1 -1 - 0.1346 (-0.7148)
0.0012 1 0.7435 -
0.0119 10 0.8952 -
0.0237 20 0.8604 -
0.0356 30 0.8869 -
0.0474 40 0.924 -
0.0593 50 0.8146 -
0.0711 60 0.9116 -
0.0830 70 0.8595 -
0.0948 80 0.8881 -
0.1067 90 0.8793 -
0.1185 100 0.8568 -
0.1304 110 0.8389 -
0.1422 120 0.8486 -
0.1541 130 0.8219 -
0.1659 140 0.8428 -
0.1778 150 0.8187 -
0.1896 160 0.7387 -
0.2015 170 0.658 -
0.2133 180 0.6728 -
0.2252 190 0.6725 -
0.2370 200 0.5657 0.8263 (-0.0231)
0.2489 210 0.517 -
0.2607 220 0.4983 -
0.2726 230 0.5309 -
0.2844 240 0.4927 -
0.2963 250 0.5733 -
0.3081 260 0.5188 -
0.32 270 0.5496 -
0.3319 280 0.4925 -
0.3437 290 0.5078 -
0.3556 300 0.5287 -
0.3674 310 0.4579 -
0.3793 320 0.4382 -
0.3911 330 0.4201 -
0.4030 340 0.4193 -
0.4148 350 0.4398 -
0.4267 360 0.3959 -
0.4385 370 0.4356 -
0.4504 380 0.4551 -
0.4622 390 0.4156 -
0.4741 400 0.3969 0.8758 (+0.0264)
0.4859 410 0.3614 -
0.4978 420 0.4567 -
0.5096 430 0.3743 -
0.5215 440 0.45 -
0.5333 450 0.4246 -
0.5452 460 0.39 -
0.5570 470 0.4236 -
0.5689 480 0.3827 -
0.5807 490 0.3516 -
0.5926 500 0.462 -
0.6044 510 0.4161 -
0.6163 520 0.388 -
0.6281 530 0.3719 -
0.64 540 0.4343 -
0.6519 550 0.3842 -
0.6637 560 0.422 -
0.6756 570 0.3523 -
0.6874 580 0.3907 -
0.6993 590 0.294 -
0.7111 600 0.4234 0.8875 (+0.0381)
0.7230 610 0.4502 -
0.7348 620 0.3912 -
0.7467 630 0.3575 -
0.7585 640 0.3319 -
0.7704 650 0.3795 -
0.7822 660 0.3854 -
0.7941 670 0.3285 -
0.8059 680 0.3836 -
0.8178 690 0.3775 -
0.8296 700 0.3503 -
0.8415 710 0.3741 -
0.8533 720 0.3502 -
0.8652 730 0.3793 -
0.8770 740 0.3352 -
0.8889 750 0.3062 -
0.9007 760 0.3634 -
0.9126 770 0.3542 -
0.9244 780 0.353 -
0.9363 790 0.3565 -
0.9481 800 0.4184 0.8945 (+0.0451)

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.9.1
  • Accelerate: 1.11.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bnkc123/modernbert-base-gooaq-bce

Finetuned
(974)
this model

Evaluation results