You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3.0 KiB
3.0 KiB
FireCrawl Integration Example
This example demonstrates how to use FireCrawl with DeepSearcher to crawl and extract content from websites.
Overview
FireCrawl is a specialized web crawling service designed for AI applications. This example shows:
- Setting up FireCrawl with DeepSearcher
- Configuring API keys for the service
- Crawling a website and extracting content
- Querying the extracted content
Code Example
import logging
import os
from deepsearcher.offline_loading import load_from_website
from deepsearcher.online_query import query
from deepsearcher.configuration import Configuration, init_config
# Suppress unnecessary logging from third-party libraries
logging.getLogger("httpx").setLevel(logging.WARNING)
# Set API keys (ensure these are set securely in real applications)
os.environ['OPENAI_API_KEY'] = 'sk-***************'
os.environ['FIRECRAWL_API_KEY'] = 'fc-***************'
def main():
# Step 1: Initialize configuration
config = Configuration()
# Set up Vector Database (Milvus) and Web Crawler (FireCrawlCrawler)
config.set_provider_config("vector_db", "Milvus", {})
config.set_provider_config("web_crawler", "FireCrawlCrawler", {})
# Apply the configuration
init_config(config)
# Step 2: Load data from a website into Milvus
website_url = "https://example.com" # Replace with your target website
collection_name = "FireCrawl"
collection_description = "All Milvus Documents"
# crawl a single webpage
load_from_website(urls=website_url, collection_name=collection_name, collection_description=collection_description)
# only applicable if using Firecrawl: deepsearcher can crawl multiple webpages, by setting max_depth, limit, allow_backward_links
# load_from_website(urls=website_url, max_depth=2, limit=20, allow_backward_links=True, collection_name=collection_name, collection_description=collection_description)
# Step 3: Query the loaded data
question = "What is Milvus?" # Replace with your actual question
result = query(question)
if __name__ == "__main__":
main()
Running the Example
- Install DeepSearcher:
pip install deepsearcher
- Sign up for a FireCrawl API key at firecrawl.dev
- Replace the placeholder API keys with your actual keys
- Change the
website_url
to the website you want to crawl - Run the script:
python load_website_using_firecrawl.py
Advanced Crawling Options
FireCrawl provides several advanced options for crawling:
max_depth
: Control how many links deep the crawler should golimit
: Set a maximum number of pages to crawlallow_backward_links
: Allow the crawler to navigate to parent/sibling pages
Key Concepts
- Web Crawling: Extracting content from websites
- Depth Control: Managing how deep the crawler navigates
- URL Processing: Handling multiple pages from a single starting point
- Vector Storage: Storing the crawled content in a vector database for search