The Conflict and Protest Events dataset displays the dates, actors, types of violence, locations, and fatalities of all reported political violence and protest events across Africa, South and Southeast Asia, the Middle East, and Europe. Political violence and protest includes events that occur in the context of civil wars and periods of instability, public protest, and regime breakdown. This data can be used for immediate and long-term analysis and mapping of political violence and protest across developing countries through the use of historical data, as well as to inform humanitarian and development work in crisis- and conflict-affected contexts through real-time data updates and reports.

This data is produced by the Armed Conflict Location and Event Data Project (ACLED). The project covers all African countries from 1997 to the present, and select countries in the Middle East, Asia, and Europe from 2010 or 2018.

ACLED scrapper gets the data of violence types, and numbers for a particular location(country, state, city, province etc). Data is scrapped using API call requests and handled using the pandas library.


Column names are rectified, extra spaces, special characters etc are removed. Columns with irrelevant data; ["geom", "gid_1", "gid_2"] are dropped. Location names are rectified and country names are formatted correctly.


Coordinates are added to the metadata for the country/state/city/subdivision level location. Geocoder library is used for getting coordinates.


Additional information like region, region_code, subdivision level, subdivision code/ income level , name, domain, subdomain, source and description are included in the metadata. Function for fetching ISO country code and appending it is present in standardisation. Predefined domains and subdomains are added in this step.


Metadata and event record data is ingested in the mongoDB and latest timestamp id (mongoDB id for latest timestamp) is appended to metadata for decreasing search for latest data point.


Three different files are available in the LocationRisk. is used for validating data in the mongoDB and is used to calculate risk score for the data. Risk for ACLED is calculated using z-score and z-score is classified into the risk categories. fetches data from the mongoDB and implements and Risk data is ingested into the location risk database.

Data Format:

Time series Data:

Attributes Descriptions
ts_ref_id Id used to connect time series data to the metadata
value Time Series information stored for ACLED
timestamp standard timestamp used for the timeseries


Short Name Long Name Description
ts_ref_id Time series reference Id Id used to connect metadata to the timeseries
map_coordinates Map coordinates Latitude and Longitude of the location (geojson format)
country Country Name Country name for which the conflict events are recorded
country_code Country code ISO 3622 letter country code
date_of_sampling Date in "%d/%m/%Y" Date on which data was collected
domain Domain Predefined domain by Taiyo
subdomain Subdomain Predefined subdomain by Taiyo
Identifier Identifier / Indicator

6 types of identifiers are stored for conflict and protest events

- battles

- protests

- riots

- explosions_remote_violence

- strategic_developments

- violence_against_civilians

location_level_1 Location level Location of the event
location_level_2 Location level 2 Granular location of the event
name name Name of the Source
objectid Objectid as supplied by the ACLED Location-specific unique id assigned by the ACLED
region Rgion Region for a country according to World Bank Standards
region_code Region Code Region code for a region according to World Bank Standards
value Value of Identifiers Number of identifier events happening at that location
sub_division_name Sub Division State/Country/Province ISO 3622 sub division name etc
sub_division_code Sub Division Code ISO 3622 sub division code
url Data Source URL Url to access the datasource
income_level Income level Income level of the region in context
sample_frequency Frequency Frequency of data being collected/updated
time_of_sampling Time of data collection

Time of data collection recorded in

"%I:%M:%S %p"

timestamp timestamp UTC standard time of data sampling
shape_length Shape length Shape length of the region
shape_area Shape Area Area of the region
subdivision_level Subdivision type meta Subdivision (state/province/territory etc) meta data
city_level City City metadata

Data Flow:

The above data pipeline runs on Argo and it will be executed on a periodic frequency.

Scope for Improvement

Following can be improved in the next version of the data product:

  • Bulk data ingestion lacks logic of including previous data in the current

dataset. Hence, MetaData and Ingest step needs to be updated for the ACLED data product.

  • Data can be more granular.

