Twitter’s veritable firehose of data has its uses, particularly in the fields of AI and machine learning. Chik-fil-A recently began tapping it to spot signs of foodborne illness at its restaurants, and scientists at the Joint Research Center — the European Commission’s science and knowledge service — earlier this year detailed a prototype that gauges real-time flood reports from Twitter users. Now, in a new city logistics study (“Unsupervised Machine Learning to Analyse City Logistics“) spearheaded by a Mines ParisTech team, tweets were analyzed by a machine learning algorithm to track conversation trends concerning city logistics, particularly around issues like low emission zones and urban distribution centers.

“The role of logistics is to make goods (and services) available to consumers efficiently, both in terms of costs and customer service … City logistics policymaking is complex as it requires diagnosis and analysis, thus observation,” wrote the coauthors, who noted that practical barriers often stymie large-scale city logistics surveys. “This paper …  examines how [it can] contribute to … observation and analysis.”

In the course of their work, the team leveraged two key machine learning techniques known as dimensionality reduction and clustering. The former reduces the number of variables under consideration by obtaining a set of principal variables, while clustering groups objects in such a way that objects in the same group have more in common compared with objects in other groups.


The researchers’ machine learning model then scraped Twitter for phrases like “city logistics,” “last-mile logistics,” “urban logistics,” and “urban freight,” and collected tweets filtered to erase both undesired content (such as links, symbols, and linking words) and duplicate entries. Words in the text were then lemmatized — or grouped together such that they could be analyzed as a single item — and combined into a map of features.

The team next performed sentiment analysis on the extracted text with the open source Natural Language Toolkit (NLTK), a suite of libraries and programs for English-language symbolic and statistical natural language processing. They calculated the polarity score (negative versus positive) of each sample using Valence Aware Dictionary and Sentiment Reasoner (VADER), a rule-based sentiment intensity analyzer, and computed traditional statistics.

On a data set of 111,265 tweets published between 2007 and 2018 containing key city logistics terms, they found that the phrase “city logistics” tended to be more popular (appearing in 66% of tweets) than “urban logistics” (9% of tweets) and “urban freight” (6%), and that the most commonly occurring city logistics phrases were related to employment (“commercial driver’s license,” “job,” “CDL”). Among other insights, the team reports that Kansas City — a key transit point for commerce in the U.S. — was in the top-five most active regions by volume of activity related to city logistics, according to the results, and that the proportion of positive logistics-related content increased from 43% in 2016 to 68% in 2018.

“In order to assess what social media mining can bring to the observation of city logistics, it is critical to identify under-represented issues and/or blind spot,” wrote the researchers. ” Quite satisfyingly, one can find a rather large range of issues (e.g. road safety, fuel consumption, sustainability, urban fabric, etc.) and solutions (e.g. training, ICT, urban consolidation centers, clean vehicles, cargo-bikes, etc.) … In contrast, some concepts, very much advertised in academic circles, are almost absent in the corpus (e.g. 21 tweets about the Physical Internet, eight about off-hour deliveries, three about synchro-modality).”

The team notes that their approach doesn’t consider account activity level, reach, or biases, which they leave to future work. Still, they consider Twitter a “formidable opportunity” for city logistics stakeholders because of the platform’s sheer size.