In 2020, Foursquare and Factual merged with complementary offerings. With this, came the opportunity to integrate the two legacy POI datasets: Factual Places and Foursquare Venues (City Guide). We’ve unified Factual’s best in class methods for collecting POI data and core attributes with Foursquare’s strength in collecting fresh first-party and user-generated content. The result is an improvement in the accuracy, freshness, and depth of coverage of our POI data.
Factual brings high accuracy and fill rates, chain coverage, and consistency:
- Data aggregated from 46K unique sources
- +99% fill rate for US chains
- +95% of global venues with geocodes
Foursquare brings data freshness, user-generated content, and venue popularity insights from around the world:
- 2 million updates per month from first-party app users
- Over a billion user-generated photos, tips, and reviews
The integration process involved ingesting over a billion inputs from trusted Factual sources and 2 million monthly user-generated updates from Foursquare. We then identified and deleted duplicates to reduce noise and maximize quality. Finally, we matched venues between the two datasets using a single data processing pipeline built with the best of our ML methodologies.
The result is a new Integrated Places dataset that now drives the new Places API V3. The integration improves the overall data quality, increases our 3rd party sources, and our ability to update venues with structured data. The new Integrated Places dataset represents a catalog of the over 100M most popular places to shop, eat and play in over 200 countries.
Besides the increase in volume, the most significant change affecting the Places API results is the inclusion of new Quality Filtering rules that improve the overall quality of the dataset. These rules are built using machine learning models and human verification that continuously refine the dataset. The Quality Filtering aims to remove fake venues, private businesses, people, things, and venues that are now likely closed and venues with only a single source or low quality sources. Our QA process also includes:
- Detailed QA for deep data analysis: We sample chains and randomly selected POIs in major markets to validate their existence and accuracy
- Ground truth comparisons for key metropolitan areas: We demonstrate our data accuracy against data on the ground, verified by real people
- Spot reviews for each data release: We focus on data corruption, any significant changes to the dataset size, and randomized samples to spot check each release
What has changed with the v3/search logic compared to v2/search/recommendations and v2/venues/search?
The new v3/search endpoint more closely mimics the v2/search/recommendations logic and was built to similar specifications regarding behavior. The logic was reworked for the sole intent of improving efficiency and to consolidate the number of parameters required, such as the sort preferences. There is a slight change in how they interact with the response payloads. We no longer support the following parameters: intent, llAcc, alt, altAcc, and personalization. We are looking into adding the following v2/ parameters into v3/search at a later date: prices, open now, local day/time, and features.
Overall, the main engineering change is how we've rebuilt our Search Endpoint to leverage new architecture for faster response times. The new endpoint is quicker at delivering higher quality results. The other endpoints such as Details, Photos, and Tips are also upgraded with improved spam filtering and smaller response payloads.
The largest factor driving the difference in search results between two identical queries in the Places API V2 and the new Places API V3 is the Integrated Places dataset. Places API V2 queries the Foursquare Venues dataset that supports our City Guide app. The new Places API V3 queries the new Integrated Places dataset. This includes all of the POIs from Factual that were not present in the legacy Foursquare Venues database. Some locations will see much higher volume in search results. Some locations might show similar or less volume, depending on the parameters and overall quality of the data in that location.
Please contact our Support team with any questions.
Updated 11 months ago