--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:210 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: Snowflake/snowflake-arctic-embed-l widget: - source_sentence: 'What does maintenance refer to in the context of providing for another person? ' sentences: - '-M- Maintenance: The f urnishing by one person to another the means of living, or fo od, clothing,' - 'income and expenses to determine if the debtor may proceed under Chapter 7. Chapter 7 trustee A person appointed in a Chapter 7 case to represent the interests of the bankruptcy estate and the creditors. The trustee''s responsibilities include reviewing the debtor''s petition and schedules, liquidating the property of the estate, and making distributions to creditors. The trustee may also bring actions against creditors or the debtor to recover property of the bankruptcy estate. Chapter 9' - '-19- Trial De Novo: A new trial (See 22NYCRR 28.12). -U- Undertaking: Deposit of a sum of money or filing of a bond in court, to secure some actual or potential obligation. -V- Vacate: To set aside or undo a previous action or order. Venire: Technically, a writ summoning persons to court to act as jurors; popularly used as meaning the body of names thus summoned. Venue: (a) Geographical place where some legal matter occurs or may be determined. (b) The geographical area within which a court has jurisdiction. It relates only to a place or territory within which either party may require a case to be tried. A defect in venue may be waived by the parties.' - source_sentence: 'What does the term "Pro Se" refer to in a legal context? ' sentences: - 'Process: A l egal means, such as a s ummons, used to s ubject a de fendant i n a l awsuit to the personal jur isdiction o f the c ourt; broa dly, r efers to all writs iss ued i n the c ourse of a le gal proceeding - what is served to obtain jurisdiction. Pro Se (aka Self-Represented): Appearing on one’s own behalf without an attorney. Purge: To atone for or correct an offense, to submit to a court''s mandate (i.e., to purge oneself of contempt of court). -Q- None. -R- Recuse: To disqualify oneself as a judge. Redact: To edit, revise or block out written text. Referee: A person to whom a claim pending in a court is referred by the court to take testimony,' - '-10- Hearing: A pr eliminary examination where testimony is given and e vidence presented for the purpose of determining an issue of fact and reaching a decision on the basis of that evidence. Hearsay: Testimony of a witness who relates not what he/she knows personally, but what others have told the witness, or what the witness has heard said by others; may be admissible or inadmissible in court depending upon rules of evidence. Hung Jury: A jury whose members cannot reconcile their differences of opinion and thus cannot reach a verdict. -I- Impaneling: The process by which jurors are selected and sworn to their task. Impleader: An addition of another party to an action by the defendant, a “third party” claim.' - '-12- Jurisdiction, Subject Matter: Whether the court has authority over the thing or right claimed by one party against another. Jury: A prescribed number of persons selected according to law and sworn to make findings of fact. Jury (Advisory): A body of jurors impaneled to hear a case in which the parties have no right to a jury trial - the judge remains solely responsible for the findings and may accept or reject the jury''s verdict. Jury Instructions: Directions given by the judge to the jury, at the beginning and end of trial. -K- None. -L- Laches: The failure to diligently assert a right, which results in a refusal to allow the right to be asserted later. Legal Age: Eighteen (18) years of age. See CPLR Section 1206.' - source_sentence: What is the purpose of a Chapter 11 bankruptcy filing? sentences: - 'condemnation, i.e., the legal process by which real estate of a private owner is taken for public use without the owner''s consent, but upon the award and payment of just compensation. Enjoin: To require a person, by writ of injunction from a court of equity, to perform or to refrain from or cease doing some act. Entry: The formal filing of an order of judgment with the County Clerk. Equitable Action (Equity Matter): An action which may be brought for the purpose of restraining' - 'A legal claim. Chambers The offices of a judge and his or her staff. Chapter 11 A reorganization bankruptcy, usually involving a corporation or partnership. A Chapter 11 debtor usually proposes a plan of reorganization to keep its business alive and pay creditors over time. Individuals or people in business can also seek relief in Chapter 11. Chapter 12 The chapter of the Bankruptcy Code providing for adjustment of debts of a "family farmer" or "family fisherman," as the terms are defined in the Bankruptcy Code. Chapter 13 The chapter of the Bankruptcy Code providing for the adjustment of debts of an individual with regular income, often referred to as a "wage-earner" plan. Chapter 13 allows a debtor' - 'Conviction A judgment of guilt against a criminal defendant. Counsel Legal advice; a term also used to refer to the lawyers in a case. Count An allegation in an indictment or information, charging a defendant with a crime. An indictment or information may contain allegations that the defendant committed more than one crime. Each allegation is referred to as a count. Court Government entity authorized to resolve legal disputes. Judges sometimes use "court" to refer to themselves in the third person, as in "the court has read the briefs." Court reporter A person who makes a word-for-word record of what is said in court, generally by using a stenographic machine, shorthand or audio recording, and then produces a transcript of the' - source_sentence: 'What types of property may a debtor be able to exempt under the homestead exemption? ' sentences: - '-2- Affidavit of Service: An affidavit intended to certify or prove that service of a writ, notice, or other document has been made. Affirm: An act of declaring something to be true under the penalty of perjury by a person who conscientiously declines to take an oath for religious or other pertinent reasons; also attorneys are permitted to affirm rather than swear under oath. Affirmation: A solemn and formal declaration under penalties of perjury that a statement is true, without an oath. Affirmed: Upheld, agreed with (e.g.,The Appellate Court affirmed the judgment of the City Court); also means a challenge to a court decision or order was rejected.' - 'A formal request for the protection of the federal bankruptcy laws. (There is an official form for bankruptcy petitions.) Bankruptcy trustee A private individual or corporation appointed in all Chapter 7 and Chapter 13 cases to represent the interests of the bankruptcy estate and the debtor''s creditors. Bench trial A trial without a jury, in which the judge serves as the fact-finder. Brief A written statement submitted in a trial or appellate proceeding that explains one side''s legal and factual arguments. Burden of proof The duty to prove disputed facts. In civil cases, a plaintiff generally has the burden of proving his or her case. In criminal cases, the government has the burden of proving the defendant''s guilt. (See standard of proof.)' - 'residence (homestead exemption), or some or all "tools of the trade" used by the debtor to make a living (i.e., auto tools for an auto mechanic or dental tools for a dentist). The availability and amount of property the debtor may exempt depends on the state the debtor lives in. F Face sheet filing A bankruptcy case filed either without schedules or with incomplete schedules listing few creditors and debts. (Face sheet filings are often made for the purpose of delaying an' - source_sentence: How does a fraudulent transfer relate to a debtor's intent in bankruptcy cases? sentences: - 'Glossary of Legal Terms Find definitions of legal terms to help understand the federal court system. A Acquittal A jury verdict that a criminal defendant is not guilty, or the finding of a judge that the evidence is insufficient to support a conviction. Active judge A judge in the full-time service of the court. Compare to senior judge. Administrative Office of the United States Courts (AO) Enter legal term to search for definition Search' - 'A serious crime, usually punishable by at least one year in prison. File To place a paper in the official custody of the clerk of court to enter into the files or records of a case. Fraudulent transfer A transfer of a debtor''s property made with intent to defraud or for which the debtor receives less than the transferred property''s value. Fresh start The characterization of a debtor''s status after bankruptcy, i.e., free of most debts. (Giving debtors a fresh start is one purpose of the Bankruptcy Code.) G Grand jury A body of 16-23 citizens who listen to evidence of criminal allegations, which is presented by the prosecutors, and determine whether there is probable cause to believe an individual' - '-3- Argument: A reason given in proof or rebuttal to persuade a judge or jury. At Issue: Whenever the parties to an action come to a point in the pleadings or argument which is affirmed on one side and denied on the other, the points are said to be "at issue". Attachment: The taking of property into legal custody by an enforcement officer (See specialty section: Recovery of Chattel). Attestation: The act of witnessing an instrument in writing at the request of the party making the instrument and signing it as a witness. Attorney of Record: Attorney whose name appears in the court’s records or files of a case. Award: A decision of an Arbitrator, judge or jury. -B-' pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l results: - task: type: information-retrieval name: Information Retrieval dataset: name: Unknown type: unknown metrics: - type: cosine_accuracy@1 value: 0.9318181818181818 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9318181818181818 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9545454545454546 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9318181818181818 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.3106060606060606 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1909090909090909 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09999999999999996 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9318181818181818 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9318181818181818 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9545454545454546 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9565434941101226 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9438131313131314 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9438131313131314 name: Cosine Map@100 --- # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 1024 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("vin00d/snowflake-arctic-legal-ft-1") # Run inference sentences = [ "How does a fraudulent transfer relate to a debtor's intent in bankruptcy cases?", "A serious crime, usually punishable by at least one year in prison.\nFile\nTo place a paper in the official custody of the clerk of court to enter into the files or records\nof a case.\nFraudulent transfer\nA transfer of a debtor's property made with intent to defraud or for which the debtor\nreceives less than the transferred property's value.\nFresh start\nThe characterization of a debtor's status after bankruptcy, i.e., free of most debts. (Giving\ndebtors a fresh start is one purpose of the Bankruptcy Code.)\nG\nGrand jury\nA body of 16-23 citizens who listen to evidence of criminal allegations, which is presented by\nthe prosecutors, and determine whether there is probable cause to believe an individual", '-3-\nArgument: A reason given in proof or rebuttal to persuade a judge or jury.\nAt Issue: Whenever the parties to an action come to a point in the pleadings or argument which\nis affirmed on one side and denied on the other, the points are said to be "at issue".\nAttachment: The taking of property into legal custody by an enforcement officer (See specialty\nsection: Recovery of Chattel).\nAttestation: The act of witnessing an instrument in writing at the request of the party making the\ninstrument and signing it as a witness.\nAttorney of Record: Attorney whose name appears in the court’s records or files of a case.\nAward: A decision of an Arbitrator, judge or jury.\n-B-', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 1024] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9318 | | cosine_accuracy@3 | 0.9318 | | cosine_accuracy@5 | 0.9545 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9318 | | cosine_precision@3 | 0.3106 | | cosine_precision@5 | 0.1909 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9318 | | cosine_recall@3 | 0.9318 | | cosine_recall@5 | 0.9545 | | cosine_recall@10 | 1.0 | | **cosine_ndcg@10** | **0.9565** | | cosine_mrr@10 | 0.9438 | | cosine_map@100 | 0.9438 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 210 training samples * Columns: sentence_0 and sentence_1 * Approximate statistics based on the first 210 samples: | | sentence_0 | sentence_1 | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | sentence_0 | sentence_1 | |:---------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What is the purpose of the glossary of common legal terms provided in the context? | GLOSSARY ‐ COMMON LEGAL TERMS
NOTE:  The following definitions are not legal definitions.  Rather, these definitions are
intended to give you a general idea of the meanings of common legal words.  For 
comprehensive Definitions of legal terms, you may wish to consult a legal dictionary
 “Black’s Law Dictionary” is one such legal dictionary which is usually available at
 most law libraries.
This glossary of common legal terms is also available on‐line at:
http://www.nycourts.gov/lawlibraries/glossary.shtml

ADDITIONAL ON‐LINE RESOURCES:
http://www.nolo.com/glossary.cfm 
Nolo’s on‐line legal dictionary.
http://www.law‐dictionary.org/
Free on‐line legal dictionary search engine.
http://www.law.cornell.edu/wex
| | Where can one find a comprehensive legal dictionary for more detailed definitions of legal terms? | GLOSSARY ‐ COMMON LEGAL TERMS
NOTE:  The following definitions are not legal definitions.  Rather, these definitions are
intended to give you a general idea of the meanings of common legal words.  For 
comprehensive Definitions of legal terms, you may wish to consult a legal dictionary
 “Black’s Law Dictionary” is one such legal dictionary which is usually available at
 most law libraries.
This glossary of common legal terms is also available on‐line at:
http://www.nycourts.gov/lawlibraries/glossary.shtml

ADDITIONAL ON‐LINE RESOURCES:
http://www.nolo.com/glossary.cfm 
Nolo’s on‐line legal dictionary.
http://www.law‐dictionary.org/
Free on‐line legal dictionary search engine.
http://www.law.cornell.edu/wex
| | What organization maintains the legal dictionary and encyclopedia mentioned in the context? | Legal dictionary and encyclopedia maintained by the
Legal Information Institute at Cornell Law School.
| * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 10 - `per_device_eval_batch_size`: 10 - `num_train_epochs`: 10 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 10 - `per_device_eval_batch_size`: 10 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 10 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Training Logs | Epoch | Step | cosine_ndcg@10 | |:------:|:----:|:--------------:| | 1.0 | 21 | 0.9240 | | 2.0 | 42 | 0.9628 | | 2.3810 | 50 | 0.9628 | | 3.0 | 63 | 0.9502 | | 4.0 | 84 | 0.9569 | | 4.7619 | 100 | 0.9563 | | 5.0 | 105 | 0.9556 | | 6.0 | 126 | 0.9569 | | 7.0 | 147 | 0.9555 | | 7.1429 | 150 | 0.9555 | | 8.0 | 168 | 0.9565 | | 9.0 | 189 | 0.9565 | | 9.5238 | 200 | 0.9565 | | 10.0 | 210 | 0.9565 | ### Framework Versions - Python: 3.11.11 - Sentence Transformers: 3.4.1 - Transformers: 4.48.3 - PyTorch: 2.5.1+cu124 - Accelerate: 1.3.0 - Datasets: 3.3.2 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```