LLM Prefilter Usage Guide¶
The LLM prefilter automatically identifies irrelevant job postings and routes them to a separate filtered_jobs table, keeping only relevant jobs in the main job_listings table.
Quick Start¶
1. Job Search with Prefiltering¶
# Enable prefiltering during job ingestion
python -m flows.ingest "Python Developer" "Remote" linkedin 50 --prefilter --prefilter-threshold 0.45
# Use custom threshold (0.0 = filter everything, 1.0 = filter nothing)
python -m flows.ingest "Data Engineer" "San Francisco" linkedin 25 --prefilter --prefilter-threshold 0.6
What happens:
- Jobs with relevance score ≥ threshold → saved to job_listings
- Jobs with relevance score < threshold → saved to filtered_jobs
- Prefilter reasoning stored in ai_model_reasoning field
- Raw prefilter data stored in raw_payload["prefilter"]
2. Autonomous Filtering of Existing Jobs¶
# Sweep existing jobs and filter irrelevant ones
python -m flows.prefilter sweep --threshold 0.45 --limit 500 --dry-run
# Actually move jobs (remove --dry-run)
python -m flows.prefilter sweep --threshold 0.45 --limit 500 --days-back 30
# Process in smaller batches
python -m flows.prefilter sweep --threshold 0.5 --batch-size 10 --limit 100
What happens:
- Loads unprocessed jobs from job_listings table
- Applies LLM prefiltering to each job
- Jobs below threshold: moved from job_listings → filtered_jobs
- Jobs above threshold: updated with prefilter metadata, stay in job_listings
Configuration¶
Enable by Default (config/settings.toml)¶
Optional Stoplist (disabled by default)¶
Database Tables¶
job_listings: Relevant jobs (score ≥ threshold)filtered_jobs: Irrelevant jobs (score < threshold)- Both tables have identical schema
- Duplicate detection works across both tables
Examples¶
# High-precision filtering (fewer false positives)
python -m flows.ingest "AI Engineer" "Remote" --prefilter --prefilter-threshold 0.7
# Broad filtering (more aggressive)
python -m flows.ingest "Software Engineer" "NYC" --prefilter --prefilter-threshold 0.3
# Dry run to see what would be filtered
python -m flows.prefilter sweep --dry-run --limit 50
# Clean up last week's jobs
python -m flows.prefilter sweep --days-back 7 --threshold 0.5
Monitoring¶
Check prefilter results: