Machine Learning Deciphers News Language to Predict Country’s Peace Level

Summary: Researchers devised a machine learning model that gauges a country’s peace level by analyzing word frequency in its news media.

By studying over 723,000 articles from 18 countries, the team identified distinct linguistic patterns corresponding to varying peace levels.

While high-peace countries often used terms related to optimism and daily life, lower-peace nations favored words tied to governance and control.

The algorithm, though English-biased, offers a fresh lens to explore linguistic differences across cultures.

Key Facts:

  1. The study analyzed 723,574 media articles from 18 countries, categorizing them as high-peace, intermediate-peace, or low-peace.
  2. High-peace countries predominantly used words hinting at optimism and day-to-day life, while low-peace countries leaned towards terms related to government and control.
  3. The machine learning model successfully identified intermediate-peace nations using the trained linguistic criteria, showcasing its predictive potential.

Source: PLOS

By analyzing the frequency of certain words within mainstream news media from any country, a machine learning algorithm can produce a quantitative “peace index” that captures the level of peace within that country, according to a new study published this week in the open-access journal PLOS ONE by Larry Liebovitch and Peter T. Coleman of Columbia University, US, and colleagues.

The language used in media both reflects a culture’s view of the world and influences how people within the culture think and act. “Hate speech” can mobilize violence and destruction. Much less is known about how “peace speech” characterizes peaceful cultures and may help to generate or sustain peace.

In the new study, Liebovitch and colleagues used five previously developed and highly respected peace indices to capture levels of peace within 18 countries classified as high-peace, intermediate-peace or low-peace. They then collected 723,574 media articles originating from these countries; all were written by local sources and published online in English.

Using only the high-peace and low-peace countries, the researchers used a machine learning model to identify words whose use in the media was associated with levels of peace.

Overall, lower-peace countries were characterized by the higher prevalence of words related to government, order, control and fear (such as government, state, law, security and court), while higher-peace countries were characterized by an increased prevalence of words related to optimism for the future and fun (such as time, like, home, believe and game).

When the researchers applied the trained machine learning model to media from the intermediate-peace countries that had not originally been included, the model correctly identified the countries as having intermediate levels of peace.

The authors point out that their data was biased in that all the sources were in English, which means the authors’ model is more reliable in evaluating countries where English is a more common language for news communication. Additionally, the method may include bias already integrated into the preconceived peace indices used in the work.

Despite the limitations, the authors conclude that the data serves as a good starting point to further explore the linguistic differences between lower-peace and high-peace cultures.  

The authors add: “We used machine learning to find the words in local news media that best indicate the level of peace in a country. In less peaceful countries, news media focus on government and social control, while in more peaceful countries, its focus is on personal preferences and the activities of everyday life. We also found that high-peace countries evidenced a much higher level of diversity of terms than low-peace countries.”

About this AI and linguistics research news

Author: Hanna Abdallah
Source: PLOS
Contact: Hanna Abdallah – PLOS
Image: The image is credited to Neuroscience News

Original Research: Open access.
Word differences in news media of lower and higher peace countries revealed by natural language processing and machine learning” by Larry S. Liebovitch et al. PLOS ONE


Abstract

Word differences in news media of lower and higher peace countries revealed by natural language processing and machine learning

Language is both a cause and a consequence of the social processes that lead to conflict or peace. “Hate speech” can mobilize violence and destruction. What are the characteristics of “peace speech” that reflect and support the social processes that maintain peace?

This study used existing peace indices, machine learning, and on-line, news media sources to identify the words most associated with lower-peace versus higher-peace countries. As each peace index measures different social properties, they can have different values for the same country.

There is however greater consensus with these indices for the countries that are at the extremes of lower-peace and higher-peace. Therefore, a data driven approach was used to find the words most important in distinguishing lower-peace and higher-peace countries.

Rather than assuming a theoretical framework that predicts which words are more likely in lower-peace and higher-peace countries, and then searching for those words in news media, in this study, natural language processing and machine learning were used to identify the words that most accurately classified a country as lower-peace or higher-peace.

Once the machine learning model was trained on the word frequencies from the extreme lower-peace and higher-peace countries, that model was also used to compute a quantitative peace index for these and other intermediate-peace countries.

The model successfully yielded a quantitative peace index for intermediate-peace countries that was in between that of the lower-peace and higher-peace, even though they were not in the training set.

This study demonstrates how natural language processing and machine learning can help to generate new quantitative measures of social systems, which in this study, were linguistic differences resulting in a quantitative index of peace for countries at different levels of peacefulness.