Wednesday, May 21, 2008

GeoWeb Grid Lock

James Fee's recent post Don't Give Away the Farm! regarding the news that "Google and ESRI will allow indexing of ArcGIS Server services by Google (and anyone who crawls the web)" got me thinking. I could not help but wonder if those hosting ArcGIS Server solutions are prepared for the potential onslaught of traffic to their services once Google users begin scrapping their data?

This reminds me of a presentation I attended several years ago at the ESRI Annual Users Conference in San Diego. Some federal agency published an ArcIMS powered site showing current wildfires in the US. The site received an enormous amount of hits (for an ArcIMS web mapping site), far more than was expected for the site and far more than the servers were capable of supporting. The agency worked with ESRI to scale the backend infrastructure to support the unexpectedly high traffic.

I can foresee this scenario happening to data providers once they allow their content to be crawled by Google. I would guess most small to medium sized organizations will be ill prepared for the potentially enormous amount of traffic to their services will receive if they happen to have content the public wants.


Rob said...

Google indexes our websites on a regular basis and the load is equal to a single user surfing around. Why does indexing a GIS system use more bandwidth than indexing a website?

MtnMaven said...

There is nothing inherently unique about the content being GIS related. I am not concerned about Google indexing my GIS content in order for others to find my data when searching in GE. I am concerned that now that a vastly larger audience can now find my data my systems will not be able to handle the traffic. And that I will have little ability in the short term to anticipate the potential traffic generated via GE. As more AGS users allow their sites to be crawled by GE a body of knowledge will be created to help form some best practices for hardware sizing. Until then we are left grasping at straws.

Based on my understanding of the architecture if a person using Google Earth (GE) crawls my ArcGIS Server (AGS) data, the content would be drawn from my AGS powered site and rendered in GE. Each request from the GE user would result in transactions on my AGS server.

The rendering of GIS data within GE is like a unique hit to a web site, but not a simple HTML page with images. Rather fairly complex SQL queries are required and on data potentially with hundreds of thousands or millions of rows. The SQL statements are querying for records in an RDBMS based on proximity, point in polygon, polygon in polygon, etc.