Real Estate Listing Scraper (HouseSigma & Toronto Market)
Upwork

Remote
•1 hour ago
•No application
About
🛠️ Upwork Job Description: Real Estate Listing Scraper (HouseSigma & Toronto Market) This is a request for a web developer/data engineer to create a robust and automated web scraping program. The primary goal is to continuously scrape new residential listings from HouseSigma and other key Toronto real estate websites, store the data in a structured format, and implement an alert system based on specific property criteria. Job Title: Automated Real Estate Listing Scraper & Alert System (Python/Data Engineering) Project Type: Ongoing, Milestone-based (with potential for maintenance/expansion) Skills Required: Expert Python development (including experience with scraping libraries like Beautiful Soup, Scrapy, or Playwright/Selenium for dynamic content). Experience with Google Sheets API or database integration (e.g., PostgreSQL, MongoDB). Familiarity with web scraping best practices, including handling CAPTCHAs, proxies, and respecting robots.txt. Experience with email automation/notification (e.g., using SMTPLib or a dedicated service). Strong understanding of data structuring and cleaning. Project Goals & Deliverables The core deliverable is a fully functional, automated program that performs the following tasks: 1. Web Scraping & Data Extraction Target Site: HouseSigma (primary target) and other relevant, publicly available Toronto real estate listing sites (to be expanded upon, initial focus is on HouseSigma). Data Points to Scrape: Address (Full) Listing Price Lot Depth (in feet/meters) Lot Frontage (in feet/meters) Listing URL Frequency: The scraper must be designed to run at regular, automated intervals (e.g., every 30-60 minutes) to capture new listings quickly. 2. Data Storage & Database Creation Storage Medium: The scraped data must be stored in either a Google Drive Sheet or a hosted database (e.g., AWS RDS, MongoDB Atlas). Please specify your preferred database solution in your proposal. Data Integrity: The program must check for and avoid duplicate listings to ensure the database contains only unique properties. Database Goal: The stored data will serve as a long-term database of all scraped listings for future analysis and mapping. 3. Real-Time Alert System The system must check every new listing against a defined set of criteria. Criteria: The system needs to check for a combination of: Minimum Lot Depth Minimum Lot Frontage Maximum Listing Price Notification: If a listing matches ALL defined criteria, an automated alert must be sent to the client via: Email: Containing the relevant details and the direct URL. OR Simple Dashboard: (Optional, but preferred) A simple web-based dashboard showing matched listings. To Apply, Please Include: A brief description of your prior experience building complex web scrapers or data pipelines. Your proposed technical stack (e.g., Python, Scrapy, Google Sheets API, etc.). How you plan to handle dynamic content/anti-scraping measures on sites like HouseSigma. Your estimated timeline and cost for the initial setup and deployment.
Adzuna



