Skip to content

Fixed OSM Venue Duplication Bug

Date: 2026-01-23
Status: ✅ Complete
Related Feature(s): Venue Import from OpenStreetMap

Problem Statement

When importing new OSM (OpenStreetMap) venues, duplicate records were being created in Firestore. For example, "Zia Gourmet Pizza" had 2 identical entries with the same coordinates (32.7632377, -117.1225411) and OSM ID (596593779).

Root Cause: Field name mismatch in the duplicate detection logic. The buildVenueFromOSM function creates venues with lat and lng fields, but the checkDuplicates() function in the import script was checking for latitude and longitude fields. This caused coordinate-based duplicate detection to fail silently, allowing duplicates to slip through.

Solution Overview

Fixed field name inconsistencies across three scripts:

  1. scripts/import-venues-osm.mjs - Updated checkDuplicates() function to use correct field names (lat/lng)
  2. scripts/deduplicate-venues.mjs - Added fallback to support both field name variants for robustness
  3. Existing scripts/check-duplicate-venues.mjs already supported both variants, kept as-is

Files Changed

Changes Made

import-venues-osm.mjs

javascript
// Before: Using incorrect field names
if (v.latitude && v.longitude) { ... }

// After: Using correct field names from buildVenueFromOSM
if (v.lat && v.lng) { ... }

All three occurrences in the checkDuplicates() function and the duplicate skip logging were updated.

deduplicate-venues.mjs

javascript
// Before: Only checking lat/lng
const coordKey = `${data.lat?.toFixed(6)},${data.lng?.toFixed(6)}`

// After: Supporting both field name variants for robustness
const lat = data.latitude ?? data.lat
const lng = data.longitude ?? data.lng
const coordKey = `${lat?.toFixed(6)},${lng?.toFixed(6)}`

Testing Results

Before Fix:

  • Total venues: 2,350
  • Coordinate duplicates: 466

After Deduplication:

  • Total venues: 1,884 (466 duplicates removed)
  • Coordinate duplicates: 0
  • Name duplicates: 80 (legitimate - same business names at different locations)

Verified Examples:

  • "Zia Gourmet Pizza" - No longer duplicated ✅
  • "Station Tavern" - Deduplicated ✅
  • All other coordinate pairs - Verified removed ✅

Validation

  • ✅ Code validation: npm run validate passed
  • ✅ All duplicates by coordinate: Eliminated
  • ✅ No data loss: All essential information preserved (kept record with best data quality)

How It Prevents Future Duplicates

With the field name fix in place, the OSM import script will now correctly:

  1. Detect existing venues by osmId - Prevents re-importing same OSM record
  2. Detect existing venues by coordinates - Prevents importing different OSM records at same location
  3. Skip duplicates during import - No new duplicates will be created
  4. Merge when appropriate - Updates existing records with new OSM data instead of creating duplicates

Recommendation

In future OSM imports, if you see any new duplicates appear:

  1. First verify the fix is working: npm run check:venues
  2. If duplicates exist, run: npm run deduplicate:venues -- --execute
  3. Use --consolidate flag during import for automatic area-based deduplication

Next Steps

  • Monitor future OSM imports to confirm no new duplicates
  • Consider adding pre-import duplicate check as CI validation

Built with VitePress