You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
70 lines
2.2 KiB
70 lines
2.2 KiB
2 weeks ago
|
# File Loader Configuration
|
||
|
|
||
|
DeepSearcher supports various file loaders to extract and process content from different file formats.
|
||
|
|
||
|
## 📝 Basic Configuration
|
||
|
|
||
|
```python
|
||
|
config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")
|
||
|
```
|
||
|
|
||
|
## 📋 Available File Loaders
|
||
|
|
||
|
| Loader | Description | Supported Formats |
|
||
|
|--------|-------------|-------------------|
|
||
|
| **UnstructuredLoader** | General purpose document loader with broad format support | PDF, DOCX, PPT, HTML, etc. |
|
||
|
| **DoclingLoader** | Document processing library with extraction capabilities | See [documentation](https://docling-project.github.io/docling/usage/supported_formats/) |
|
||
|
|
||
|
## 🔍 File Loader Options
|
||
|
|
||
|
### Unstructured
|
||
|
|
||
|
[Unstructured](https://unstructured.io/) is a powerful library for extracting content from various document formats.
|
||
|
|
||
|
```python
|
||
|
config.set_provider_config("file_loader", "UnstructuredLoader", {})
|
||
|
```
|
||
|
|
||
|
??? tip "Setup Instructions"
|
||
|
|
||
|
You can use Unstructured in two ways:
|
||
|
|
||
|
1. **With API** (recommended for production)
|
||
|
- Set environment variables:
|
||
|
- `UNSTRUCTURED_API_KEY`
|
||
|
- `UNSTRUCTURED_API_URL`
|
||
|
|
||
|
2. **Local Processing**
|
||
|
- Simply don't set the API environment variables
|
||
|
- Install required dependencies:
|
||
|
```bash
|
||
|
# Install core dependencies
|
||
|
pip install unstructured-ingest
|
||
|
|
||
|
# For all document formats
|
||
|
pip install "unstructured[all-docs]"
|
||
|
|
||
|
# For specific formats (e.g., PDF only)
|
||
|
pip install "unstructured[pdf]"
|
||
|
```
|
||
|
|
||
|
For more information:
|
||
|
- [Unstructured Documentation](https://docs.unstructured.io/ingestion/overview)
|
||
|
- [Installation Guide](https://docs.unstructured.io/open-source/installation/full-installation)
|
||
|
|
||
|
### Docling
|
||
|
|
||
|
[Docling](https://docling-project.github.io/docling/) provides document processing capabilities with support for multiple formats.
|
||
|
|
||
|
```python
|
||
|
config.set_provider_config("file_loader", "DoclingLoader", {})
|
||
|
```
|
||
|
|
||
|
??? tip "Setup Instructions"
|
||
|
|
||
|
1. Install Docling:
|
||
|
```bash
|
||
|
pip install docling
|
||
|
```
|
||
|
|
||
|
2. For information on supported formats, see the [Docling documentation](https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats).
|