Forced Migration from
Ukraine: Lessons Learned
from Organic Data
Lisa Singh, Katharine Donato, Ali Arab,
Nathan Wycoff, Elizabeth Jacobs
Georgetown University, USA
Broad Goal
What types of organic data can
improve our understanding of
emerging and/or prolonged
forced migration?
What is Organic Data?
Non-design data generated as part of a person’s routine and/or a
societys normal functions
Strengths Weaknesses
Generated in a more
natural setting
Offers real-time data
for analysis
Promising in difficult-
to-access
environments, where
design data are hard
to obtain
Lots of it
Hard to process
Difficult to
generate
variables
Noisy, partial
and biased
Possible ethical
considerations
Organic (Big) Data Sources Used in Research
Newspapers (MaDD)
Twitter (MaDD)
Google trends (MaDD)
ACLED events
GDELT
Facebook advertising
LinkedIn
Reddit
YouTube comments
WhatsApp
Social media public groups
Predicting International Migration Flow from Ukraine
Based on UNHCR flow data
First 6 Months of Conflict
UNHCR Flow Data from Ukraine
Constructing Variables: Twitter Example
Ukrainian keywords used to construct conversation buzz variables
Flee measures
Flee: I am leaving; going to; taking train to; arrived at
Insecurity measures
Physical: Weapons; soldiers; rockets; bombs; explosion; attack; deaths
Food: Hunger; food shortage; rationing, drinking water
Health: COVID; corona; omicron; pandemic; hospitals; medical supplies
Contextual measures
Political: Zelensky; Putin; negotiations; declaration; protests; war
Economic: Economy; exchange rate; gas; oil; sanctions; exports; money
Modeling: Using Organic Variable to Measure Flow
 


- Order of Magnitude of Outflow to Slovakia, Hungary and Poland.



- Lag and Aggregation (Laggregated) Organic Variable:
Window radius
Lag
Regression coefficient vector

- Day of week effect.
Gaussian, Poisson, Negative Binomial likelihood give qualitatively similar results
Comparing Data Sources Relationship to Flow
Counts of
Individuals
Leaving
Ukraine
UNHCR Flow Estimate
Organic Data Estimate
LEGEND
Explainability vs Timeliness
Google trends
GDELT
ACLED
Newspapers
Twitter
Lag (Days)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
-20
-10
0 10
20
travel
food insecurity
physical insecurity
political
economic
event count
health insecurity
food insecurity
total negative tone
total positive tone
flee
economic
fatalities
physical insecurity
political
political
physical insecurity
flee
economic
food insecurity
health insecurity
health insecurity
Fit (R squared)
LAGGING
LEADING
‘Prediction Error at Two Time Points
PHASE 1:
GDELT negative tone
News - health
PHASE 2:
Previous week’s mean
Trends Food insecurity
Lessons Learned
Because Ukraine has more granular flow data, using a simple model that considers the mean flow from the previous
week is reasonable (after the initial hump) for international migration.
When a crisis emerges, public organic data sources are a viable option for modeling the changing dynamics of flow.
For this crisis, Google trends data (generally) is the best organic data signal for retrospective analysis and
nowcasting. All organic sources captured the two phases of the crisis.
For longer term forecasting in countries with more sparse flow data, more variables are needed and models that quantify
uncertainty and allow for more variation in temporal and spatial resolution are important (hierarchical Bayesian).
MaDD Core Team
Social Scientists
Faculty
Katharine Donato
Elizabeth Ferris
Susan Martin
MDI/ISIM Fellow
Nathan Wycoff
Elizabeth Jacobs*
Computational Scientists
Faculty
Lisa Singh
Ali Arab
Ameeta Agrawal
MDI Technical Team
Colton Padden*
Yiqing Ren
Maanasa Vatsavayi
Interdisciplinary Students
o Yanchen Wang (PhD)
o Rob Churchill (PhD)*
o Yaguang Liu (PhD)
o Ken Kawintiranon (PhD)*
o Didier Akilimali(MS)
o Qihang Wang (MS)
o Sonali Rathinam (MS)
o Aidan Pizzo (undergrad)
o Jenny Park (undergrad)
Thanks to McCourt Institute, Massive Data Institute,
& Institute for the Study of International Migration
for their funding.
* Recent alumni