Add new SentenceTransformer model
Browse files- README.md +126 -129
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -37,27 +37,30 @@ widget:
|
|
| 37 |
\ pediatrician, or paediatrician. The word pediatrics and its cognates mean healer\
|
| 38 |
\ of children; they derive from two Greek words: Ï\x80αá¿\x96Ï\x82 (pais child)\
|
| 39 |
\ and ἰαÏ\x84Ï\x81Ï\x8CÏ\x82 (iatros doctor, healer)."
|
| 40 |
-
- source_sentence:
|
| 41 |
-
|
| 42 |
sentences:
|
| 43 |
-
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
- source_sentence: A man is riding on one wheel on a motorcycle.
|
| 50 |
sentences:
|
| 51 |
-
-
|
| 52 |
-
-
|
| 53 |
-
-
|
| 54 |
-
-
|
|
|
|
|
|
|
| 55 |
sentences:
|
| 56 |
-
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
- source_sentence: what are the three subatomic particles called?
|
| 62 |
sentences:
|
| 63 |
- Subatomic particles include electrons, the negatively charged, almost massless
|
|
@@ -168,7 +171,7 @@ print(query_embeddings.shape, document_embeddings.shape)
|
|
| 168 |
# Get the similarity scores for the embeddings
|
| 169 |
similarities = model.similarity(query_embeddings, document_embeddings)
|
| 170 |
print(similarities)
|
| 171 |
-
# tensor([[
|
| 172 |
```
|
| 173 |
|
| 174 |
<!--
|
|
@@ -221,13 +224,13 @@ You can finetune this model on your own dataset.
|
|
| 221 |
| | sentence1 | sentence2 | label |
|
| 222 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------|
|
| 223 |
| type | string | string | int |
|
| 224 |
-
| details | <ul><li>min:
|
| 225 |
* Samples:
|
| 226 |
-
| sentence1
|
| 227 |
-
|
| 228 |
-
| <code>
|
| 229 |
-
| <code>
|
| 230 |
-
| <code>
|
| 231 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 232 |
```json
|
| 233 |
{
|
|
@@ -244,16 +247,16 @@ You can finetune this model on your own dataset.
|
|
| 244 |
* Size: 11,004 training samples
|
| 245 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
|
| 246 |
* Approximate statistics based on the first 1000 samples:
|
| 247 |
-
| | sentence1 | sentence2
|
| 248 |
-
|
| 249 |
-
| type | string | string
|
| 250 |
-
| details | <ul><li>min:
|
| 251 |
* Samples:
|
| 252 |
-
| sentence1
|
| 253 |
-
|
| 254 |
-
| <code>
|
| 255 |
-
| <code>
|
| 256 |
-
| <code>
|
| 257 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 258 |
```json
|
| 259 |
{
|
|
@@ -273,13 +276,13 @@ You can finetune this model on your own dataset.
|
|
| 273 |
| | sentence1 | sentence2 | label |
|
| 274 |
|:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 275 |
| type | string | string | int |
|
| 276 |
-
| details | <ul><li>min:
|
| 277 |
* Samples:
|
| 278 |
-
| sentence1
|
| 279 |
-
|
| 280 |
-
| <code>
|
| 281 |
-
| <code>
|
| 282 |
-
| <code>
|
| 283 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 284 |
```json
|
| 285 |
{
|
|
@@ -299,13 +302,13 @@ You can finetune this model on your own dataset.
|
|
| 299 |
| | sentence1 | sentence2 | label |
|
| 300 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
|
| 301 |
| type | string | string | int |
|
| 302 |
-
| details | <ul><li>min: 6 tokens</li><li>mean: 22.
|
| 303 |
* Samples:
|
| 304 |
-
| sentence1
|
| 305 |
-
|
| 306 |
-
| <code>
|
| 307 |
-
| <code>
|
| 308 |
-
| <code>
|
| 309 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 310 |
```json
|
| 311 |
{
|
|
@@ -322,16 +325,16 @@ You can finetune this model on your own dataset.
|
|
| 322 |
* Size: 10,047 training samples
|
| 323 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
|
| 324 |
* Approximate statistics based on the first 1000 samples:
|
| 325 |
-
| | sentence1 | sentence2
|
| 326 |
-
|
| 327 |
-
| type | string | string
|
| 328 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 17.
|
| 329 |
* Samples:
|
| 330 |
-
| sentence1
|
| 331 |
-
|
| 332 |
-
| <code>
|
| 333 |
-
| <code>
|
| 334 |
-
| <code>
|
| 335 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 336 |
```json
|
| 337 |
{
|
|
@@ -351,13 +354,13 @@ You can finetune this model on your own dataset.
|
|
| 351 |
| | sentence1 | sentence2 | label |
|
| 352 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 353 |
| type | string | string | float |
|
| 354 |
-
| details | <ul><li>min: 6 tokens</li><li>mean:
|
| 355 |
* Samples:
|
| 356 |
-
| sentence1
|
| 357 |
-
|
| 358 |
-
| <code>
|
| 359 |
-
| <code>
|
| 360 |
-
| <code>The
|
| 361 |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
|
| 362 |
```json
|
| 363 |
{
|
|
@@ -377,13 +380,13 @@ You can finetune this model on your own dataset.
|
|
| 377 |
| | sentence1 | sentence2 | label |
|
| 378 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 379 |
| type | string | string | float |
|
| 380 |
-
| details | <ul><li>min: 6 tokens</li><li>mean: 12.
|
| 381 |
* Samples:
|
| 382 |
-
| sentence1
|
| 383 |
-
|
| 384 |
-
| <code>
|
| 385 |
-
| <code>
|
| 386 |
-
| <code>A
|
| 387 |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
|
| 388 |
```json
|
| 389 |
{
|
|
@@ -400,16 +403,16 @@ You can finetune this model on your own dataset.
|
|
| 400 |
* Size: 14,280 training samples
|
| 401 |
* Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
|
| 402 |
* Approximate statistics based on the first 1000 samples:
|
| 403 |
-
| | label | sentence1 | sentence2
|
| 404 |
-
|
| 405 |
-
| type | float | string | string
|
| 406 |
-
| details | <ul><li>min: 0.0</li><li>mean: 3.
|
| 407 |
* Samples:
|
| 408 |
-
| label
|
| 409 |
-
|
| 410 |
-
| <code>
|
| 411 |
-
| <code>
|
| 412 |
-
| <code>
|
| 413 |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
|
| 414 |
```json
|
| 415 |
{
|
|
@@ -728,15 +731,11 @@ You can finetune this model on your own dataset.
|
|
| 728 |
### Training Hyperparameters
|
| 729 |
#### Non-Default Hyperparameters
|
| 730 |
|
| 731 |
-
- `per_device_train_batch_size`:
|
| 732 |
-
- `learning_rate`:
|
| 733 |
-
- `weight_decay`:
|
| 734 |
- `num_train_epochs`: 1
|
| 735 |
-
- `warmup_ratio`: 0.03
|
| 736 |
- `fp16`: True
|
| 737 |
-
- `gradient_checkpointing`: True
|
| 738 |
-
- `torch_compile`: True
|
| 739 |
-
- `torch_compile_backend`: inductor
|
| 740 |
|
| 741 |
#### All Hyperparameters
|
| 742 |
<details><summary>Click to expand</summary>
|
|
@@ -745,15 +744,15 @@ You can finetune this model on your own dataset.
|
|
| 745 |
- `do_predict`: False
|
| 746 |
- `eval_strategy`: no
|
| 747 |
- `prediction_loss_only`: True
|
| 748 |
-
- `per_device_train_batch_size`:
|
| 749 |
- `per_device_eval_batch_size`: 8
|
| 750 |
- `per_gpu_train_batch_size`: None
|
| 751 |
- `per_gpu_eval_batch_size`: None
|
| 752 |
- `gradient_accumulation_steps`: 1
|
| 753 |
- `eval_accumulation_steps`: None
|
| 754 |
- `torch_empty_cache_steps`: None
|
| 755 |
-
- `learning_rate`:
|
| 756 |
-
- `weight_decay`:
|
| 757 |
- `adam_beta1`: 0.9
|
| 758 |
- `adam_beta2`: 0.999
|
| 759 |
- `adam_epsilon`: 1e-08
|
|
@@ -762,7 +761,7 @@ You can finetune this model on your own dataset.
|
|
| 762 |
- `max_steps`: -1
|
| 763 |
- `lr_scheduler_type`: linear
|
| 764 |
- `lr_scheduler_kwargs`: {}
|
| 765 |
-
- `warmup_ratio`: 0.
|
| 766 |
- `warmup_steps`: 0
|
| 767 |
- `log_level`: passive
|
| 768 |
- `log_level_replica`: warning
|
|
@@ -828,7 +827,7 @@ You can finetune this model on your own dataset.
|
|
| 828 |
- `hub_private_repo`: None
|
| 829 |
- `hub_always_push`: False
|
| 830 |
- `hub_revision`: None
|
| 831 |
-
- `gradient_checkpointing`:
|
| 832 |
- `gradient_checkpointing_kwargs`: None
|
| 833 |
- `include_inputs_for_metrics`: False
|
| 834 |
- `include_for_metrics`: []
|
|
@@ -842,8 +841,8 @@ You can finetune this model on your own dataset.
|
|
| 842 |
- `torchdynamo`: None
|
| 843 |
- `ray_scope`: last
|
| 844 |
- `ddp_timeout`: 1800
|
| 845 |
-
- `torch_compile`:
|
| 846 |
-
- `torch_compile_backend`:
|
| 847 |
- `torch_compile_mode`: None
|
| 848 |
- `include_tokens_per_second`: False
|
| 849 |
- `include_num_input_tokens_seen`: no
|
|
@@ -866,45 +865,43 @@ You can finetune this model on your own dataset.
|
|
| 866 |
### Training Logs
|
| 867 |
| Epoch | Step | Training Loss |
|
| 868 |
|:------:|:-----:|:-------------:|
|
| 869 |
-
| 0.
|
| 870 |
-
| 0.
|
| 871 |
-
| 0.
|
| 872 |
-
| 0.
|
| 873 |
-
| 0.
|
| 874 |
-
| 0.
|
| 875 |
-
| 0.
|
| 876 |
-
| 0.
|
| 877 |
-
| 0.
|
| 878 |
-
| 0.
|
| 879 |
-
| 0.
|
| 880 |
-
| 0.
|
| 881 |
-
| 0.
|
| 882 |
-
| 0.
|
| 883 |
-
| 0.
|
| 884 |
-
| 0.
|
| 885 |
-
| 0.
|
| 886 |
-
| 0.
|
| 887 |
-
| 0.
|
| 888 |
-
| 0.
|
| 889 |
-
| 0.
|
| 890 |
-
| 0.
|
| 891 |
-
| 0.
|
| 892 |
-
| 0.
|
| 893 |
-
| 0.
|
| 894 |
-
| 0.
|
| 895 |
-
| 0.
|
| 896 |
-
| 0.
|
| 897 |
-
| 0.
|
| 898 |
-
| 0.
|
| 899 |
-
| 0.
|
| 900 |
-
| 0.
|
| 901 |
-
| 0.
|
| 902 |
-
| 0.
|
| 903 |
-
| 0.
|
| 904 |
-
| 0.
|
| 905 |
-
| 0.
|
| 906 |
-
| 0.9528 | 19000 | 1.9995 |
|
| 907 |
-
| 0.9778 | 19500 | 1.8916 |
|
| 908 |
|
| 909 |
|
| 910 |
### Framework Versions
|
|
|
|
| 37 |
\ pediatrician, or paediatrician. The word pediatrics and its cognates mean healer\
|
| 38 |
\ of children; they derive from two Greek words: Ï\x80αá¿\x96Ï\x82 (pais child)\
|
| 39 |
\ and ἰαÏ\x84Ï\x81Ï\x8CÏ\x82 (iatros doctor, healer)."
|
| 40 |
+
- source_sentence: However , in 1919 , concluded that no more operational awards would
|
| 41 |
+
be made for the recently decreed war .
|
| 42 |
sentences:
|
| 43 |
+
- At executive level , EEAA represents the central arm of the ministry .
|
| 44 |
+
- In 1919 , however , no operational awards would be made for the recently concluded
|
| 45 |
+
war .
|
| 46 |
+
- He was asked his opinion about the books `` Mission to Moscow '' by Joseph E.
|
| 47 |
+
Davies and `` One World '' by Wendell Willkie .
|
| 48 |
+
- source_sentence: Twelve killed in bomb blast on Pakistani train
|
|
|
|
| 49 |
sentences:
|
| 50 |
+
- Five killed by bomb blast in East India
|
| 51 |
+
- Five million citizens get unofficial salary in Ukraine
|
| 52 |
+
- Above that, seniors would be responsible for 100 percent of drug costs until the
|
| 53 |
+
out-of-pocket total reaches $3,600.
|
| 54 |
+
- source_sentence: Pen Hadow, who became the first person to reach the geographic
|
| 55 |
+
North Pole unsupported from Canada, has just over two days of rations left.
|
| 56 |
sentences:
|
| 57 |
+
- Remnants of highly enriched uranium were found near an Iranian nuclear facility
|
| 58 |
+
by United Nations inspectors, deepening fears that Iran possibly has a secret
|
| 59 |
+
nuclear weapons program.
|
| 60 |
+
- However, the singer believes that artists similar to his self should not receive
|
| 61 |
+
any blame.
|
| 62 |
+
- Pen Hadow, the first person to to reach the North Pole, has only a little more
|
| 63 |
+
than two days of rations left.
|
| 64 |
- source_sentence: what are the three subatomic particles called?
|
| 65 |
sentences:
|
| 66 |
- Subatomic particles include electrons, the negatively charged, almost massless
|
|
|
|
| 171 |
# Get the similarity scores for the embeddings
|
| 172 |
similarities = model.similarity(query_embeddings, document_embeddings)
|
| 173 |
print(similarities)
|
| 174 |
+
# tensor([[0.6104, 0.0070, 0.0514]])
|
| 175 |
```
|
| 176 |
|
| 177 |
<!--
|
|
|
|
| 224 |
| | sentence1 | sentence2 | label |
|
| 225 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------|
|
| 226 |
| type | string | string | int |
|
| 227 |
+
| details | <ul><li>min: 10 tokens</li><li>mean: 27.74 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 27.73 tokens</li><li>max: 55 tokens</li></ul> | <ul><li>0: ~54.60%</li><li>1: ~45.40%</li></ul> |
|
| 228 |
* Samples:
|
| 229 |
+
| sentence1 | sentence2 | label |
|
| 230 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 231 |
+
| <code>Göttsche received international acclaim with his formula for the generating function for the Hilbert numbers of the Betti scheme of points on an algebraic surface :</code> | <code>With his formula for the producing function for the Betti - numbers of the Hilbert scheme of points on an algebraic surface , Göttsche received international recognition :</code> | <code>0</code> |
|
| 232 |
+
| <code>The former AFL players Tarkyn Lockyer ( Collingwood ) and Ryan Brabazon ( Sydney ) , Jason Mandzij ( Gold Coast ) , started their football careers and played for the Kangas .</code> | <code>Former AFL players Ryan Brabazon ( Collingwood ) and Tarkyn Lockyer ( Sydney ) , Jason Mandzij ( Gold Coast ) started their football careers playing for the Kangas .</code> | <code>0</code> |
|
| 233 |
+
| <code>Potter married in 1945 . He and his wife Anne ( a weaver ) had two children , Julian ( born 1947 ) and Mary ( born 1952 ) .</code> | <code>He and his wife Anne ( a weaver ) had two children , Julian ( born 1947 ) and Mary ( born in 1952 ) .</code> | <code>0</code> |
|
| 234 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 235 |
```json
|
| 236 |
{
|
|
|
|
| 247 |
* Size: 11,004 training samples
|
| 248 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
|
| 249 |
* Approximate statistics based on the first 1000 samples:
|
| 250 |
+
| | sentence1 | sentence2 | label |
|
| 251 |
+
|:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
|
| 252 |
+
| type | string | string | int |
|
| 253 |
+
| details | <ul><li>min: 10 tokens</li><li>mean: 27.33 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 27.3 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>0: ~32.40%</li><li>1: ~67.60%</li></ul> |
|
| 254 |
* Samples:
|
| 255 |
+
| sentence1 | sentence2 | label |
|
| 256 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 257 |
+
| <code>Passed in 1999 but never put into effect , the law would have made it illegal for bar and restaurant patrons to light up .</code> | <code>Passed in 1999 but never put into effect , the smoking law would have prevented bar and restaurant patrons from lighting up , but exempted private clubs from the regulation .</code> | <code>0</code> |
|
| 258 |
+
| <code>" Indeed , Iran should be put on notice that efforts to try to remake Iraq in their image will be aggressively put down , " he said .</code> | <code>" Iran should be on notice that attempts to remake Iraq in Iran 's image will be aggressively put down , " he said .</code> | <code>1</code> |
|
| 259 |
+
| <code>But U.S. troops will not shrink from mounting raids and attacking their foes when their locations can be pinpointed .</code> | <code>But American troops will not shrink from mounting raids in the locations of their foes that can be pinpointed .</code> | <code>1</code> |
|
| 260 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 261 |
```json
|
| 262 |
{
|
|
|
|
| 276 |
| | sentence1 | sentence2 | label |
|
| 277 |
|:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 278 |
| type | string | string | int |
|
| 279 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 13.83 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 29 tokens</li><li>mean: 340.09 tokens</li><li>max: 1024 tokens</li></ul> | <ul><li>0: ~31.40%</li><li>1: ~68.60%</li></ul> |
|
| 280 |
* Samples:
|
| 281 |
+
| sentence1 | sentence2 | label |
|
| 282 |
+
|:----------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 283 |
+
| <code>Performance (film) is a religion.</code> | <code>The associative model of data is a data model for database systems .. data model. data model. database. database. Other data models , such as the relational model and the object data model , are record-based .. data model. data model. relational model. relational model. These models involve encompassing attributes about a thing , such as a car , in a record structure .. Such attributes might be registration , colour , make , model , etc. .. In the associative model , everything which has `` discrete independent existence '' is modeled as an entity , and relationships between them are modeled as associations .. The granularity at which data is represented is similar to schemes presented by Chen -LRB- Entity-relationship model -RRB- ; Bracchi , Paolini and Pelagatti -LRB- Binary Relations -RRB- ; and Senko -LRB- The Entity Set Model -RRB- .. Entity-relationship model. Entity-relationship model. A number of claims made about the model by Simon Williams , in his book The Associative Model ...</code> | <code>1</code> |
|
| 284 |
+
| <code>American Gods (TV series) has one showrunner, whose name is Greg Berlanti.</code> | <code>American Gods is an American television series based on the novel of the same name , written by Neil Gaiman and originally published in 2001 .. American Gods. American Gods. Neil Gaiman. Neil Gaiman. novel of the same name. American Gods. The television series was developed by Bryan Fuller and Michael Green for the premium cable network Starz .. Bryan Fuller. Bryan Fuller. Michael Green. Michael Green ( writer ). Starz. Starz. Fuller and Green are the showrunners for the series .. Gaiman serves as an executive producer along with Fuller , Green , Craig Cegielski , Stefanie Berk , and Thom Beers .. Thom Beers. Thom Beers. The first episode premiered on the Starz network and through their streaming application on April 30 , 2017 .. Starz. Starz. In May 2017 , the series was renewed for a second season .</code> | <code>0</code> |
|
| 285 |
+
| <code>The Ren & Stimpy Show was one of the original four Nicktoons.</code> | <code>Cloud was a browser-based operating system created by Good OS LLC , a Los Angeles-based corporation .. Los Angeles. Los Angeles. The company initially launched a Linux distribution called gOS which is heavily based on Ubuntu , now in its third incarnation .. gOS. gOS ( operating system ). Ubuntu. Ubuntu ( operating system )</code> | <code>1</code> |
|
| 286 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 287 |
```json
|
| 288 |
{
|
|
|
|
| 302 |
| | sentence1 | sentence2 | label |
|
| 303 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
|
| 304 |
| type | string | string | int |
|
| 305 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 22.32 tokens</li><li>max: 61 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 22.15 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>0: ~54.50%</li><li>1: ~45.50%</li></ul> |
|
| 306 |
* Samples:
|
| 307 |
+
| sentence1 | sentence2 | label |
|
| 308 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 309 |
+
| <code>the process of shrinking the size of a file by removing data or recoding it more efficiently</code> | <code>reducing the amount of space needed to store a piece of data/bandwidth to transmit it (ex. zip files)</code> | <code>0</code> |
|
| 310 |
+
| <code>the siem software can ensure that the time is the same across devices so the security events across devices are recorded at the same time.</code> | <code>feature of a siem that makes sure all products are synced up so they are running with the same timestamps.</code> | <code>1</code> |
|
| 311 |
+
| <code>a model that is part of a dssa to describe the context and domain semantics important to understand a reference architecture and its architectural decisions</code> | <code>provide a means of information about that class of system and of comparing different architectures</code> | <code>0</code> |
|
| 312 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 313 |
```json
|
| 314 |
{
|
|
|
|
| 325 |
* Size: 10,047 training samples
|
| 326 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
|
| 327 |
* Approximate statistics based on the first 1000 samples:
|
| 328 |
+
| | sentence1 | sentence2 | label |
|
| 329 |
+
|:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
|
| 330 |
+
| type | string | string | int |
|
| 331 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 17.82 tokens</li><li>max: 124 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 17.3 tokens</li><li>max: 143 tokens</li></ul> | <ul><li>0: ~37.20%</li><li>1: ~62.80%</li></ul> |
|
| 332 |
* Samples:
|
| 333 |
+
| sentence1 | sentence2 | label |
|
| 334 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 335 |
+
| <code>"Kahuku Ranch has world - class qualities - tremendous resources, tremendous beauty and tremendous value to global biodiversity."</code> | <code>"TRENENDOUS BEAUTY AND TREMENDOUS VALUE TO GLOBAL BIODERVERSITY TRENENDOUS RESOURCES-CLASS QUALITIES KAHUKU RANCH HAS WORLD"</code> | <code>1</code> |
|
| 336 |
+
| <code>In Damascus, Syrian Information Minister Ahmad al-Hassan called the charges "baseless and illogical".</code> | <code>The Syrian Information Minister Ahmad al-Hassan, in Damascus, termed the charges without base and with no logic behind</code> | <code>1</code> |
|
| 337 |
+
| <code>We'd talk about the stars... ...and whether there might be somebody else like us out in space,... ...places we wanted to go and... it made our trials seem smaller.</code> | <code>We often would talk about the stars and if somebody else is similar to us out in the universe, places we wanted to visit and it made our problems seem minuscule.</code> | <code>1</code> |
|
| 338 |
* Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
|
| 339 |
```json
|
| 340 |
{
|
|
|
|
| 354 |
| | sentence1 | sentence2 | label |
|
| 355 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 356 |
| type | string | string | float |
|
| 357 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 15.23 tokens</li><li>max: 50 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.39 tokens</li><li>max: 51 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 2.73</li><li>max: 5.0</li></ul> |
|
| 358 |
* Samples:
|
| 359 |
+
| sentence1 | sentence2 | label |
|
| 360 |
+
|:--------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------|:--------------------------------|
|
| 361 |
+
| <code>China's anger at N. Korea overcomes worry over US</code> | <code>China's anger at North Korea overcomes worry over U.S. stealth flights</code> | <code>3.200000047683716</code> |
|
| 362 |
+
| <code>Declining issues outnumbered advancers nearly 2 to 1 on the New York Stock Exchange.</code> | <code>Advancers outnumbered decliners by nearly 8 to 3 on the NYSE and more than 11 to 5 on Nasdaq.</code> | <code>1.7999999523162842</code> |
|
| 363 |
+
| <code>The computers were reportedly located in the U.S., Canada and South Korea.</code> | <code>The PCs are scattered across the United States, Canada and South Korea.</code> | <code>4.75</code> |
|
| 364 |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
|
| 365 |
```json
|
| 366 |
{
|
|
|
|
| 380 |
| | sentence1 | sentence2 | label |
|
| 381 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 382 |
| type | string | string | float |
|
| 383 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 12.39 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 12.16 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 3.48</li><li>max: 5.0</li></ul> |
|
| 384 |
* Samples:
|
| 385 |
+
| sentence1 | sentence2 | label |
|
| 386 |
+
|:-------------------------------------------------------------------|:------------------------------------------------------------|:-------------------------------|
|
| 387 |
+
| <code>Someone is cutting some paper with scissors</code> | <code>The piece of paper is being cut</code> | <code>4.5</code> |
|
| 388 |
+
| <code>A man is hanging up the phone</code> | <code>A man is making a phone call</code> | <code>3.799999952316284</code> |
|
| 389 |
+
| <code>A person is pouring olive oil into a pot on the stove</code> | <code>A person is pouring oil for cooking into a pot</code> | <code>4.300000190734863</code> |
|
| 390 |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
|
| 391 |
```json
|
| 392 |
{
|
|
|
|
| 403 |
* Size: 14,280 training samples
|
| 404 |
* Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
|
| 405 |
* Approximate statistics based on the first 1000 samples:
|
| 406 |
+
| | label | sentence1 | sentence2 |
|
| 407 |
+
|:--------|:---------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 408 |
+
| type | float | string | string |
|
| 409 |
+
| details | <ul><li>min: 0.0</li><li>mean: 3.16</li><li>max: 5.0</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 19.29 tokens</li><li>max: 80 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 17.45 tokens</li><li>max: 82 tokens</li></ul> |
|
| 410 |
* Samples:
|
| 411 |
+
| label | sentence1 | sentence2 |
|
| 412 |
+
|:------------------|:------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------|
|
| 413 |
+
| <code>1.0</code> | <code>How do I wire a bathroom exhaust fan/light to two switches?</code> | <code>How do I wire a combo with two supplies?</code> |
|
| 414 |
+
| <code>4.2</code> | <code>How an all-American hero fell to earth - . (Where have all the REAL heroes gone?)</code> | <code>How all-American hero fell to earth</code> |
|
| 415 |
+
| <code>3.75</code> | <code>Be larger in number, quantity, power, status, or importance, without personally having sovereign power.</code> | <code>be larger in number, quantity, power, status or importance.</code> |
|
| 416 |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
|
| 417 |
```json
|
| 418 |
{
|
|
|
|
| 731 |
### Training Hyperparameters
|
| 732 |
#### Non-Default Hyperparameters
|
| 733 |
|
| 734 |
+
- `per_device_train_batch_size`: 384
|
| 735 |
+
- `learning_rate`: 1.0
|
| 736 |
+
- `weight_decay`: 6e-05
|
| 737 |
- `num_train_epochs`: 1
|
|
|
|
| 738 |
- `fp16`: True
|
|
|
|
|
|
|
|
|
|
| 739 |
|
| 740 |
#### All Hyperparameters
|
| 741 |
<details><summary>Click to expand</summary>
|
|
|
|
| 744 |
- `do_predict`: False
|
| 745 |
- `eval_strategy`: no
|
| 746 |
- `prediction_loss_only`: True
|
| 747 |
+
- `per_device_train_batch_size`: 384
|
| 748 |
- `per_device_eval_batch_size`: 8
|
| 749 |
- `per_gpu_train_batch_size`: None
|
| 750 |
- `per_gpu_eval_batch_size`: None
|
| 751 |
- `gradient_accumulation_steps`: 1
|
| 752 |
- `eval_accumulation_steps`: None
|
| 753 |
- `torch_empty_cache_steps`: None
|
| 754 |
+
- `learning_rate`: 1.0
|
| 755 |
+
- `weight_decay`: 6e-05
|
| 756 |
- `adam_beta1`: 0.9
|
| 757 |
- `adam_beta2`: 0.999
|
| 758 |
- `adam_epsilon`: 1e-08
|
|
|
|
| 761 |
- `max_steps`: -1
|
| 762 |
- `lr_scheduler_type`: linear
|
| 763 |
- `lr_scheduler_kwargs`: {}
|
| 764 |
+
- `warmup_ratio`: 0.0
|
| 765 |
- `warmup_steps`: 0
|
| 766 |
- `log_level`: passive
|
| 767 |
- `log_level_replica`: warning
|
|
|
|
| 827 |
- `hub_private_repo`: None
|
| 828 |
- `hub_always_push`: False
|
| 829 |
- `hub_revision`: None
|
| 830 |
+
- `gradient_checkpointing`: False
|
| 831 |
- `gradient_checkpointing_kwargs`: None
|
| 832 |
- `include_inputs_for_metrics`: False
|
| 833 |
- `include_for_metrics`: []
|
|
|
|
| 841 |
- `torchdynamo`: None
|
| 842 |
- `ray_scope`: last
|
| 843 |
- `ddp_timeout`: 1800
|
| 844 |
+
- `torch_compile`: False
|
| 845 |
+
- `torch_compile_backend`: None
|
| 846 |
- `torch_compile_mode`: None
|
| 847 |
- `include_tokens_per_second`: False
|
| 848 |
- `include_num_input_tokens_seen`: no
|
|
|
|
| 865 |
### Training Logs
|
| 866 |
| Epoch | Step | Training Loss |
|
| 867 |
|:------:|:-----:|:-------------:|
|
| 868 |
+
| 0.0267 | 500 | 4.3558 |
|
| 869 |
+
| 0.0535 | 1000 | 3.0724 |
|
| 870 |
+
| 0.0802 | 1500 | 2.979 |
|
| 871 |
+
| 0.1070 | 2000 | 2.9205 |
|
| 872 |
+
| 0.1337 | 2500 | 3.0679 |
|
| 873 |
+
| 0.1604 | 3000 | 2.837 |
|
| 874 |
+
| 0.1872 | 3500 | 3.2635 |
|
| 875 |
+
| 0.2139 | 4000 | 2.7602 |
|
| 876 |
+
| 0.2407 | 4500 | 2.6911 |
|
| 877 |
+
| 0.2674 | 5000 | 2.6963 |
|
| 878 |
+
| 0.2941 | 5500 | 2.8504 |
|
| 879 |
+
| 0.3209 | 6000 | 2.7501 |
|
| 880 |
+
| 0.3476 | 6500 | 2.6315 |
|
| 881 |
+
| 0.3744 | 7000 | 2.5372 |
|
| 882 |
+
| 0.4011 | 7500 | 2.8814 |
|
| 883 |
+
| 0.4278 | 8000 | 2.2826 |
|
| 884 |
+
| 0.4546 | 8500 | 2.764 |
|
| 885 |
+
| 0.4813 | 9000 | 2.4418 |
|
| 886 |
+
| 0.5080 | 9500 | 2.3762 |
|
| 887 |
+
| 0.5348 | 10000 | 2.5542 |
|
| 888 |
+
| 0.5615 | 10500 | 2.2653 |
|
| 889 |
+
| 0.5883 | 11000 | 2.5098 |
|
| 890 |
+
| 0.6150 | 11500 | 2.3009 |
|
| 891 |
+
| 0.6417 | 12000 | 2.4029 |
|
| 892 |
+
| 0.6685 | 12500 | 2.1538 |
|
| 893 |
+
| 0.6952 | 13000 | 2.6398 |
|
| 894 |
+
| 0.7220 | 13500 | 2.3101 |
|
| 895 |
+
| 0.7487 | 14000 | 2.8489 |
|
| 896 |
+
| 0.7754 | 14500 | 2.3822 |
|
| 897 |
+
| 0.8022 | 15000 | 2.3035 |
|
| 898 |
+
| 0.8289 | 15500 | 2.4212 |
|
| 899 |
+
| 0.8557 | 16000 | 2.1447 |
|
| 900 |
+
| 0.8824 | 16500 | 1.985 |
|
| 901 |
+
| 0.9091 | 17000 | 2.1427 |
|
| 902 |
+
| 0.9359 | 17500 | 2.3002 |
|
| 903 |
+
| 0.9626 | 18000 | 2.2671 |
|
| 904 |
+
| 0.9894 | 18500 | 2.3033 |
|
|
|
|
|
|
|
| 905 |
|
| 906 |
|
| 907 |
### Framework Versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 127538496
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e77447660a82d5a1c32834edf181ece19138af5fe0d1a489194f8674d0a79f19
|
| 3 |
size 127538496
|