World Bank data scientist position

World Bank hiring.

A GPS alumni recently sent me this job announcement. It’s related to a “Big Data Innovation Contest” that the Bank ran.

The data scientist will support an interdisciplinary team that delivers both technical assistance and knowledge activities to support World Bank Global Practices to put big data into action for development. The candidate will help test and incubate big data applications across several sectors, including pilots profiled in recent publication . Further, we are seeking specialized skills in network and graph analytics for use in applications to improve and mobilize development knowledge and services within the World Bank, and toward emerging development applications in sectoral areas like trade, mobility patterns, and accessibility to jobs.

The solution areas the data scientist will support include, but are not limited to:

  • Operational Applications: Topic modeling, natural language processing, and network analytics on development and organizational information to develop innovative and automated knowledge and data products and services to improve operational effectiveness
  • Development Applications: Provide data science technical assistance to applied research projects to test and validate big data pilots that typically use non-traditional data sources and methods, including social media, mobile phone, satellite, and ground sensor data and analytics for sectoral development applications like machine learning on big data sources to estimate poverty, to monitor crop yields, road conditions, and urbanization assessments

Download and plot Google Trends data with R 

Google trends is a service that shows the relative frequency of different google searches over time. This can be useful supplementary information, for example to measuring trends on Twitter. You can get data by search term, dates, country, and on other criteria.

The documentation is a bit sketchy, including bugs, so my this post supplements it.

Source: Download and plot Google Trends data with R | R-bloggers

For example, this code shows the relative prevalence of searches including the terms “data is” and “data are” over the past 10 years:

library(gtrendsR) #don’t forget install.packages must be done once.
user <- “<Google account email>” ## You must have a Google account. You can set one up just for this purpose.
psw <- “<Google account password>” ##Be sure it does not use 2-factor authentication
gconnect(usr, psw) #This only  has to be done once (per day? per session?)
G_trend <- gtrends(c(“Obamacare”, “health care”), res=”7d”) #retrieves the data
plot(G_trend) #shows a quick plot

My comments:

  1. The numbers are relative, not absolute. The scale seems to always go 0 to 100, and there is no way to translate those numbers into an absolute number of queries. I suggest reading Google’s documentation of google trends, before  you rely on it.
  2. the gtrends function returns a list. The exact shape of the list depends on the query you issue. The most important numerical data is in a data frame called Result_name$trend .

Analyzing 170,000,000 NYC Taxi trips

Data on cab rides in NY is available at the level of individual rides. These authors, for example, tried to predict tip amounts. Others have used them to study flow patterns at different time of day, in different weather, etc.

Because of the huge size and detailed granularity of the data, it provides lots of opportunities for computer science-y analysis.

Different quant sub-disciplines, used for different purposes

What are the differences between data science, data mining, machine learning, statistics, operations research, and so on?Here I compare several analytic disci…

Source: 16 analytic disciplines compared to data science – Data Science Central

RB comment: Useful vocabulary for job-hunting synonyms. I don’t take the nuances of his distinctions seriously, such as “business intelligence” versus “business analytics” versus “data analysis.”  Each organization needs a range of skills to do a range of activities. But, despite that, it is good for showing the wide range of quant skills that are useful.