XThomasBU
commited on
Commit
Β·
7a233a3
1
Parent(s):
d95aad5
updated README
Browse files
README.md
CHANGED
|
@@ -1,36 +1,61 @@
|
|
| 1 |
-
|
| 2 |
-
title: Dl4ds Tutor
|
| 3 |
-
emoji: π
|
| 4 |
-
colorFrom: green
|
| 5 |
-
colorTo: red
|
| 6 |
-
sdk: docker
|
| 7 |
-
pinned: false
|
| 8 |
-
hf_oauth: true
|
| 9 |
-
---
|
| 10 |
|
| 11 |
-
|
| 12 |
-
===========
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
To run the chainlit app, run the following command:
|
| 30 |
-
```chainlit run main.py```
|
| 31 |
|
| 32 |
See the [docs](https://github.com/DL4DS/dl4ds_tutor/tree/main/docs) for more information.
|
| 33 |
|
| 34 |
-
##
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DL4DS Tutor π
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
Check out the configuration reference at [Hugging Face Spaces Config Reference](https://huggingface.co/docs/hub/spaces-config-reference).
|
|
|
|
| 4 |
|
| 5 |
+
You can find an implementation of the Tutor at [DL4DS Tutor on Hugging Face](https://dl4ds-dl4ds-tutor.hf.space/), which is hosted on Hugging Face [here](https://huggingface.co/spaces/dl4ds/dl4ds_tutor).
|
| 6 |
|
| 7 |
+
## Running Locally
|
| 8 |
|
| 9 |
+
1. **Clone the Repository**
|
| 10 |
+
```bash
|
| 11 |
+
git clone https://github.com/DL4DS/dl4ds_tutor
|
| 12 |
+
```
|
| 13 |
|
| 14 |
+
2. **Put your data under the `storage/data` directory**
|
| 15 |
+
- Add URLs in the `urls.txt` file.
|
| 16 |
+
- Add other PDF files in the `storage/data` directory.
|
| 17 |
|
| 18 |
+
3. **Create the Vector Database**
|
| 19 |
+
```bash
|
| 20 |
+
cd code
|
| 21 |
+
python -m modules.vectorstore.store_manager
|
| 22 |
+
```
|
| 23 |
+
- Note: You need to run the above command when you add new data to the `storage/data` directory, or if the `storage/data/urls.txt` file is updated.
|
| 24 |
+
- Alternatively, you can set `["vectorstore"]["embedd_files"]` to `True` in the `code/modules/config/config.yaml` file, which will embed files from the storage directory every time you run the below chainlit command.
|
| 25 |
|
| 26 |
+
4. **Run the Chainlit App**
|
| 27 |
+
```bash
|
| 28 |
+
chainlit run main.py
|
| 29 |
+
```
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
See the [docs](https://github.com/DL4DS/dl4ds_tutor/tree/main/docs) for more information.
|
| 32 |
|
| 33 |
+
## File Structure
|
| 34 |
+
|
| 35 |
+
```plaintext
|
| 36 |
+
code/
|
| 37 |
+
βββ modules
|
| 38 |
+
β βββ chat # Contains the chatbot implementation
|
| 39 |
+
β βββ chat_processor # Contains the implementation to process and log the conversations
|
| 40 |
+
β βββ config # Contains the configuration files
|
| 41 |
+
β βββ dataloader # Contains the implementation to load the data from the storage directory
|
| 42 |
+
β βββ retriever # Contains the implementation to create the retriever
|
| 43 |
+
β βββ vectorstore # Contains the implementation to create the vector database
|
| 44 |
+
βββ public
|
| 45 |
+
β βββ logo_dark.png # Dark theme logo
|
| 46 |
+
β βββ logo_light.png # Light theme logo
|
| 47 |
+
β βββ test.css # Custom CSS file
|
| 48 |
+
βββ main.py
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
docs/ # Contains the documentation to the codebase and methods used
|
| 52 |
+
|
| 53 |
+
storage/
|
| 54 |
+
βββ data # Store files and URLs here
|
| 55 |
+
βββ logs # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
|
| 56 |
+
βββ models # Local LLMs are loaded from here
|
| 57 |
+
|
| 58 |
+
vectorstores/ # Stores the created vector databases
|
| 59 |
+
|
| 60 |
+
.env # This needs to be created, store the API keys here
|
| 61 |
+
```
|