tl;dr — I went from experimenting with mapping libraries to building a reusable mapping app. This is how I did it and how you can re-use it.
Intro
As a data scientist, most of my work stays behind the scenes. When training models, the farthest I reach in exposure is deploying a simple flask web-app as REST APIs.
Lately, deployment tools become friendlier, quicker and more efficient. You see, if you like to build an app for users, even for demo purposes, a minimal UI can be a good starting point.
Recently I’ve stumbled upon this example for machine learning model deployment and UI. It has tutorials for many deployment options, and a basic UI for image recognition. You can find many data scientists forking this example and using it for their classification algorithms.
This time, I’ve decided step out of my comfort zone, and “Extend” this example a bit further.
The Concept
If you walk around in Tel Aviv, Israel, you may have seen some signs, such as this one:
This memory plaque is located in Ibn Gvirol street 30, Tel Aviv, and says that in this house lived and worked David Perlov, which was a pioneer Israeli artist and film-maker.
Israel is a very young country, and as such, many of its historical art activity has taken place in a small area, in the first Israeli city, Tel Aviv.
If you are interested in art and in wandering, it may be fun for you to make yourself a self guided tour around these way points.
Unfortunately, I was not able to find any online page which gives some kind of interfaces to make yourself a touring route.
So, I decided to do it.
Not many of you know, but I have an old idea of utilizing at least some of the tremendous amounts of geographic data available online, and create some kind of a virtual tour guide, that will automatically generate walking routes in unknown cities, or original walking routes in known cities. This task is a bit ambitious, so I’ve decided to start small: these memory plaques will be my opportunity to enter the mapping world.
Step 1 : High level design
Although it is always tempting to get your hands dirty, I first needed to think what will be the end goal of this project: this will of course not be a full fledged production app, but more of a prototype-toy that will allow the user some level of exploration.
Being more precise I would like that the user will be able to:
1. See the plaques as markers on map.
2. Get some info about them.
3. Get a simple traveling route between checkpoints.
4. All of the above in a user accessible web-app (not a python notebook)
So I’ll need:
1. Data analysis tool
2. Mapping toolWeb frameworks: client and server side
3. Deployment service.
A lot of stuff to do!
Of course these definitions are not rigorous, but they depict the minimal project I have in my mind.
It is also a good time to choose a development strategy. This high level choice will accompany us throughout the work. Objectively, I think they best choice for such a task is a client side language — JavaScript. However as a data scientist my strong language is python, so I’ve chose to use it as much as I could. This choice will force me to use a server, since python can’t be used effectively within the browser.
Luckily, there are decent pythonic web frameworks such as Flask.
Step 2: data analysis
To reach the above goal I have to address some tasks:
1. Getting the data
2. Visualizing is
3. Creating UI interface
In this page http://cafe.themarker.com/post/3124888/, there is quite a messy list of all artists and their addresses.
It is easy to paste the list of artists and their addresses in a text file, and parse them into more friendly tables.
Now comes the harder part.
I did the following:
1. Saved the data in 2 files
2. Items are separated with a fullstop, so it’s easy to separate them
3. Every item includes the name, address and a short description. They are seperated with a comma, but in a bit messier way then before
4. In total, there are ~180 artists at the list
5. As said, there are ~20% of items which comma doesn't separate well
6. Some with missing comma
7. Some with extra commas, mostly in description
Solutions:
1. Since address should have number, it could be an indicator
2. Split and cleaned the into addresses and names (1 or 2 may have slipped, but we’ll have to accept it)
3. Send all all cleaned addresses to geolocator API to get the location coordinates. Not all (20) will be returned correctly for various reasons, therefore they will should be retrieved manually
After getting the data, the next task is to extract the coordinates of the way points. Clearly, the correct way to work with geographical data, is to be able to translate seamlessly address which are user-friendly, to geo-locations, which are computer friendly.
This is possible with GeoPy — package with some geocoding (place=> geocode=>place) functionalities, that can save some google APIs credits.
Step 3: Choosing our weapons
After taking care of most of the data work, the next step is mapping visualization and UI. As I’ve decided to use python as my backend, I needed some pythonic tool that will hopefully can be translated to html. There are quite a few packages that can plot a map inline and save it as (sometimes interactive) .html file. However, they have different features and capabilities. some require using premium google maps API, which means some limitation unless you pay.
So, we would like the following from our package:
1. Be able to produce interactive html map or similar
2. Drawing, tooltips, popup functionalities
3. Simple to use (got a lot of stuff to do, don’t want to waste all my time on mapping)
4. Preferably no limited API requirement
5. Bonus: extra functionalities such as search and routes
A simple but robust package will make my work faster, and the user experience better.
Therefore, I went on and conducted a small research about mapping package. Since my go-to language is python, the mapping tools are not the best out there. These are the tools I went through, which are pythonic/easy enough to use:
Google maps
Google maps deserves it’s own section since its the leading mapping tool. as such, being able to use it effectively with an API can solve most of our problems. However, this is not possible. All API’s google provides are limited in some way:
1. Google-maps-services-python
This package is more of a wrapper for google maps API that allows retrieving a lot of available data in google maps, but I did not found plotting capabilities, therefore this library is less relevant.
2. Google maps url API
this tool is very easy to use, and has to versions: dynamic and static:
The Dynamic google maps API might have been the best choice for this projects, however, perhaps surprisingly, it doesn't support much options, apart from such that available through the google maps user interface. Since one of my needs is to present a map with multiple markers I couldn't fully achieve it here.
https://www.google.com/maps/search/pizza+seattle+wa/@47.6257624,-122.331987,12z/data=!3m1!4b1 - search pizza in seatle with google maps dynamic API
The Static Google maps url API — easy as the one above, with more options, however its static therefore some of google maps dynamic abilities are off. However, I found this tool to be the most useful for my case
If you like to use it, pay attention it requires an API key, so if you don’t want to get billed you should limit it (or let to user insert his own)
e.g http://maps.google.com/maps/api/staticmap?center=shenkin,tel+aviv,Israel&zoom=14&size=512x512&maptype=roadmap&markers=color:blue|label:S|32.072283,34.777474&sensor=false&key=<key> will generate this
3. My Maps google app
A nice and necessary product of google is “My Maps” which allows you to create personalized maps with locations, markers etc. However, it seems that google have failed to provide API for it since its “infeasable”
Python packages
So back to out favorite independent open sourced python libraries:
1. Gmplot
A simple and nice python package that takes care of some basic plotting functionalities, and outputs an HTML file. It’s the only package that allows out of the box plots (on a google maps style map) however it’s options are limited. No popups of any kind are possible,
mostly wysiwyg: circles, lines, markers, heatmaps and that’s about it. no text or routes.
Pros — Very simple
Cons — Simplistic, Requires Google API
Repo+quick start: https://github.com/vgm64/gmplot
2. Bokeh
Surprisingly, good results came from a package which is not directly related to Google maps. Bokeh mapping module has everything we need, aparat from the look and feel which is not very slick.
However, no user interface is possible here as well.
3. Folium
After almost giving up on python and moving on to break my teeth on JS, I’ve accidentally stumbled upon this great library that rightfully has >3.5K stars on GitHub. Based on leaflet.js maps, it looks great, has advanced drawing options, and does not require the google API.
Folium include some extra nice features:
- Html tool tips and popups
- Marker grouping at zoomout
- Many nice plotting options
- Seamless export to html
So Folium it is!
Step 4: Logic
So we have a map and we have markers. Now its time to do something smarter: Route planning.
My task is to allow the user insert a start address, and plan a route for him to travel. Using google API we can get a smart walking route, but I left it to the user (in other words, this is out of the scope of this project) and only show polyline between the designated locations:
When receiving the address, there are different algorithms that can find optimal paths, but I chose to use a simple greedy algorithm to find the closest checkpoint. When the limit is reached (2 Km per hour) script ends the route and add a path back to the starting point, and puts everything on map as a poly-line.
Simple isn’t it?
You can find all the logic in the app.py file, in the calc_route function in the repo.
Step 5: Implementation/deployment
So after we have all logic in place, it’s time the most technical part — deployment.
This part will be easy for a web developer, but as a data scientist — it’s the part I try to avoid. The most I’ve reached is wrapping my models with a flask app and putting it on some cloud service.
In the backend part, I’ve decided to use google-app-engine. I use mostly google cloud so it’s easier for me. Although loading the web-app to such a manged service is much slower than a standard server, it handles many things that I rather not to touch
As for the front end, I took this repo and used it for my purposes with the following changes:
1. Using flask instead of uvicorn/starllete, since I’m more familiar with flask
2. Some obvious UI changes
3. The functionalities I’ve decided to add required some javascript coding
Javascript is tricky: as programming language it’s quite easy. However using it, moreover with flask, might be tricky, since there are many moving parts.
Even the debugging process is different since it requires checking the app back and forth in the browser.
My strategy was starting from the template, and googling every change I’ve wanted to do this was not easy of course since I was not familiar with:
- Behavior of Iframes
- Passing data back and forth between html, python, javascript and vice versa
And so on.
1. Adding data — after adding the markers, we would like to add some extra value to the user, adding google maps image and an excerpt of Wikipedia page
2. Path — well currently its just a data map. How will we bring it to life? I though adding a simple-artesanal route calculation algorithm would be nice. Something that I will “probably improve later”. The logic will be as follows: you enter you current address and the time you would like to devote to your tour, and the system will calculate a route that will satisfy the desired conditions. The algorithm here is greedy and not optimal, and the routes are currently straight lines between the points
6. Build your own mapping
Now you probably say to yourself “I don’t have any interest in artist in “Tel Aviv”. So I got a surprise for you: you can fork my repo, get your own addresses, change data.json and deploy your own mapping tool!
You just need to have a gcloud command line tool, an active project as described here, and with this command you’re good to go:
gcloud app deploy
If you do it, please post in the comments.
If you start from the notebook, you can use a list of locations and descriptions, and you have your own smart traveling map.
Summary
That’s it! I showed you that if you know what your doing, it is easy and fast to bring dirty dataset to life. unfortunately, it took me a lot of time to know what I’m doing, or in other words to find the Folium library.
If you followed this post, you’ve probably saw that the work was done to reach MVP: many possible improvements were left out. However it is easy to add them:
- Smarter route algorithm
- Using google maps API to generate an actual route and not a polyline
- Nicer design of the pop-ups and so on.
There are many ideas that I can think of right now, but I don’t think I’ll be touching maps or js in the next few weeks, but I encourage you to try :)
Comments