Google map plan

Table of Contents

*
*

1. Story

Title
More than one million sunken businesses

Story
A study was done on Google map found that roughly 2.7% of all businesses analyzed were located in the sea.

Some are near the shore, indicating a human mistake.
Some are randomly distributed in the middle of the ocean, indication setting up the location offshore was intentional
But most are clustered in one coordinate, one cluster has 73% of the 2.7% sharing the same coordinate that lead to the north Pacific ocean.

Questions that pop to mind:

  • How come that many businesses shared the same spot in the Pacific ocean according to google map?
  • How is that effecting their reach and revenue?
  • Are the businesses owners aware of this?
  • Are there owner that locate their businesses in the ocean by choice? If so, what are they hiding? Or is there a benefit to it?

Next
To answer these question we need to contact Google, and/or some of the top businesses.

2. Analysis

  • Total businesses in the dataset is: 4986146
  • The number of businesses in the water: 133292
  • Percentage: 2.67%
  • Link to data https://drive.google.com/drive/folders/1GZdYbNbyDjhy8DDp2_0KJZObmwAJWs7y?usp=sharing
  • According to clearlypayments, there are 33.2 million businesses int he united states; if we were to estimate number of all businesses in the sea, it will be 887518
  • Coordinates distribution

    Coordinates Count Percentage %  
    46.423669 , -129.9427086 97897 73.44 WWWWWWWWWWWW
    0 , 0 5702 4.27 h
    27.698638 , -83.804601 4149 3.11 !
    37.878638 , -122.4203375 3326 2.49 c
    37.7848269 , -122.7073054 1030 0.77 .
    33.8256055 , -118.641338 673 0.50  
    14.1576412 , -106.6918595 562 0.42  
    40.419584 , -73.6754126 427 0.32  
    41.7993125 , -70.3086624 335 0.25  
    34.032332 , -119.134398 325 0.24  
  • States distribution

    State Count Percentage %  
    California 25124 18.85 WWWWWWWWWWWW
    Florida 19333 14.50 WWWWWWWWV
    New York 10294 7.72 WWWH
    Texas 9491 7.12 WWWc
    Ohio 5052 3.79 W
    Pennsylvania 4058 3.04 !
    Georgia 3729 2.80 ;
    Massachusetts 3543 2.66 :
    Missouri 3234 2.43  
    Illinois 3182 2.39  
  • Business types distribution

    Business type Count Percentage %  
    Marketing agency 12282 9.21 WWWWWWWWWWWW
    Marketing consultant 3607 2.70 WWh
    Internet marketing service 3130 2.34 WW.
    Interior designer 2649 1.98 Wl
    House cleaning service 1656 1.24 l
    Website designer 1562 1.17 !
    Electrician 1426 1.06 ;
    Construction company 1300 0.97 :
    Tutoring service 1186 0.88 .
    Painter 1139 0.85  
  • Plot Full interactive graph
    The HTML need to be downloaded and opened in the browser
    Example of interactive plot

3. Time-line

3.1. Initiating

In late 2024, when I was scraping Google map for leads to sent my resume to.
The steps were:

  • Visiting Google map
  • In the search bar, typing a search term with this structure [Business type in City, State, Country]
    example: [Marketing agencies in Maitland, Florida, United States]
  • Store the resulted page into an HTML file.
  • Parse the HTML files into clean data.

3.2. Discovery

After I parsed the HTML files, I stared doing some exploratory analysis, then I found out that some websites end with ".ca" domain; this means that some businesses are not base in the USA, tho I specified "…United States" in the search term.

I could have just removed the businesses with a website that end with ".ca", but how would I know if a business with a ".com" website is base in the USA?

Business in .ca .us .com
United States x o o
Canada o x o
Any x x o

I couldn't get the full address from the parsed HTML's either.

Then I noticed that there was a link, that I named "g_url" in the HTML, that contains coordinates.
here is an example of one g_url:
https://www.google.com/maps/place/Scott+Le+Roy+Marketing/data=!4m7!3m6!1s0x88e77b0123b825cb:0x90132c8aad112338!8m2!3d46.423669!4d-129.9427086!16s%2Fg%2F12cnpg2gj!19sChIJyyW4IwF754gROCMRrYosE5A?authuser=0&hl=en&rclk=1

Here are the coordinates extracted from that g_url: [46.423669, -129.9427086]

I had the idea of utilizing the coordinates to get the address using Open Street Map API.

As I was using the method above, I started getting address error on some coordinates, and after some investigation I found that the business is in the north pacific ocean.


3.3. Initiating bigger project

I decided to comeback to this issue months after I found it, this time, I decided to make the search area bigger by adding more search terms.
This time using:

The result was a database of, almost, five million business record. I had to find a way to check who's in the land and who's in ocean.
Making five million requests to Open Street Map's API wasen't an option because of how slow it can be, so I had to find another method(s).
The first methods was using a library that given a coordinate, it returns whether it is in land or ocean.
https://github.com/toddkarin/global-land-mask/tree/master
The library did sometimes failed to return the correct answer, but I came up with an idea to fix it.
From the coordinate list marked as "In water" by the library, I poloted them into a map, and saved the images.
Example
PS: In the images sides are 220 meter or 0.13 miles

  1. Case 1


  2. Case 2


  3. Case 3


  4. Case 4


Case 1, 2, 3 are somewhat of an artificial island, that is why the library did not detect them as land.
I filtered out these cases, and similar ones, and got a dataset of location in google map that are in the water, but not an artificial island.

Date: 2025-01-09 Thu 20:29

Created: 2025-07-03 Thu 23:50

Validate