NLP: Beyond numeric data in aviation

Artificial Intelligence (AI) is a computer science field whose aim is to develop systems able to learn and reason as human beings. AI has really advanced by leaps and bounds during the past years. Such evolution has been made possible particularly thanks to the improvement of the processing capabilities which have enabled not only to properly train and fine-tune the algorithms (Neural networks, Logistics Regression…) used to build the AI models but also to make those high-performing models available for the end users with fast reaction times.

One of the human abilities that AI models aim to simulate is the capability of processing, understanding and generating natural language.

Natural language, oral or written, can be defined as any verbal language used by human beings to communicate with each other. In aviation as well as in other fields, a great deal of data is not encountered in the form of numeric data but hidden in written and/or oral communications. Such types of communication channels are the most commonly used between humans since they are especially effective to convey information.

Natural Language Processing (NLP) is the set of AI techniques that are in charge of solving three main challenges: making systems capable of understanding natural language, taking decisions from the information extracted from it and generating natural language to convey information as a human being does.

Figure 1. All techniques

A quick overview of NLP

Before digging any further into how the application of NLP techniques can bring value to the aviation sector, it is important to familiarize with the types of techniques that exist and to understand properly their capabilities.

However, how does NLP make systems capable of processing and understanding the meaning of text and voice data and even, taking automatic decisions and generating new natural language data? By means of developing specific AI models that are a combination of linguistic rules and statistical, machine learning and deep learning algorithms.

NLP must be understood as a wide concept that goes from capturing and processing unstructured data (text and voice) to taking automatic decisions based on them. Since this definition is too loose, covering a vast range of needs, NLP is divided in two sub categories: Natural Language Understanding (NLU) – those algorithms and AI models designed to infer information from natural language- and Natural Language Generation (NLG) – those created to generate text and voice data to convey the information resulting from computer processes.

NLP applications: the gateway to more efficient aviation business

NLP techniques can be integrated into plenty of applications for the aviation sector, ranging from document routing, to trends analysis, to organizing and comparing documents. These techniques can also be used wherever the gap between user and system at a natural language level has to be reduced. While in the following sections a number of possible applications will be presented for NLU and NLG, it must be taken into account that these two rarely go by themselves but are used combined.

Figure 2. NLP applications

Extracting data from inside a document or text. NLU enables the application of statistical tools to text to gain insight on their content, detecting relevant content or structuring the text’s data into tabulated data. These not only make possible to run advanced analytics on the outcoming data, but can also conform the cornerstone for powerful processes automation.

When applied in the aviation sector:

  • Identification of sensitive data within a written and/or oral communication is a time-consuming task that can be automated using Named Entity Recognition (NER) techniques. For instance, Regulation 376/2014 on occurrence reporting in civil aviation establishes that the identity of the reporters and persons mentioned in the occurrence must be preserved. In this context, NER models can be applied to detect sensitive data automatically within reporter descriptions and apply the appropriate mitigation measures. In the same way, the identification of sensitive data within Air Traffic Controllers oral communications could be performed using this type of NLU techniques.
  • Filling automatically occurrence or maintenance reports extracting information from the reporter description and/or other complementary textual/oral sources (aircraft databases, oral communications…) reducing considerably the time consumed by the reporters in such tasks.
  • Identification of specific keywords within textual fields to determine the urgency of a query received by Airlines or Airports. Another application could be to determine the risk level of any event by the detection of a specific set of keywords within the reporter description.
  • Transformation of words into numeric features based on the relevance they have within a document or a set of documents and then feed algorithms with such data. For instance, use those new numeric features as input for an algorithm to classify the complaints/incidences received by the customer service of Airlines or Airports to redirect them later to the responsible department.

Text or document classification. Text classification has a wide range of applications, from simply organizing documents to automating data-flows and processes or monitoring public reception of products. While text classification models require training tasks that entail the development of a dataset with example texts already labelled; once deployed, the classification algorithm has great benefit as it can significantly reduce time consumed performing such process or increase responsiveness.

When applied in the aviation sector:

  • Documents are an important asset for any type of aviation stakeholder, but if they are not easily available, they loose value. Document classification models can be used to organize documents in meaningful ways in order to get the most out of them and reduce searching times.
  • Classification models are widely used to analyse social media content or customer feedback in order to obtain intelligence about the public opinion regarding new services or the organization as a whole. With the application of such models, Airlines and Airports are able to classify the customer feedback within different topics to extract relevant insights such as what concerns their clients the most.
  • Certain models classify text based on the opinion that is expressed in it; normally between positive, negative or neutral (what is known as sentiment analysis). This can be applied to study the trends in the public opinion expressed on social media about an Airline or Airport, providing insight in product reception or being able to react more efficiently to Public Relations (PR) crisis.
  • In customer service, these models can be used to define how urgent and relevant each issue is and automatically prioritize them. This improves the user experience with the service while reducing the time consumed performing such tasks.

Comparison of texts: Text comparison can be used for a number of applications, such as organizing a set of documents based on content, identifying similar texts and highlight differences between versions of the same document. In contrast with classifiers, these AI models do not need to be trained with labelled data; hence, the training dataset does not need to be developed.

When applied in the aviation sector:

  • Identifying the similarity between two texts can be useful to determine whether the two texts are independent of one another or otherwise one is a new version of the other. These techniques can be used to identify the differences between two reports or complains covering the same event. Another application could be the use of distance algorithms to standardize manually entered fields, which later enables to cross-match data and/or extract robust statistics based on such fields.
  • In addition to document classification, aviation stakeholders can use document clustering such as topic modelling techniques to automatically group documents based on their content and offer a set of representative words of each group. A possible application for Airlines would be to use these techniques to group tweets that refer to their company by content and interpret the intent and sentiment of each group, gaining insight in public perception.

Generation of natural language-based data. Natural Language Generation (NLG) comprises those techniques whose aim is to create structured text, having as input text and/or oral communications, that is easily understandable by human beings.

When applied in the aviation sector:

  • Development of customer service chatbox for aviation stakeholders. These are capable of solving simple queries and doubts from customers, reducing time consumed performing such tasks.
  • Summarization can be used in a number of ways, providing a reduced version of a document with the main ideas of the original. This can be implemented in customer service departments to make the issues and queries faster to read or in occurrence report investigation to present the investigator with the main ideas from the occurrence, making it easier to assess the safety risk.
  • Most aviation stakeholders produce periodic reports to be published both within and outside their organizations. NLG techniques can be applied to generate automatically these reports based on numeric and textual fields from their monitoring systems.
  • Offering personalized content based on the client’s information gives a sense of proximity and complicity that greatly affects the view the person has towards the organization. Airlines can use these technologies to personalize automatically their communications based on the characteristics/interests of the segment the client belongs to. Moreover, NLG techniques could be used to personalize Safety Promotion campaigns, inside and outside the organizations, adapting content to the audience.

In ALG we are proud of our experience in treating natural language data in the aviation field, both to get value from textual and oral fields as well as gaining intelligence from the information hidden in this type of data. Thanks to our developments, our clients have been able to achieve further insight into their business from until then hardly exploitable data sources; to allow their experts to focus on real-added value tasks by automatizing mechanical ones; and to make them more capable to better identify rising issues and react to them.


About the authors
Carles Molins is MSc in Aeronautical Engineering and Consultant at ALG
Javier M. Barragan is MSc in Aeronautical Engineering and MSc in Business Intelligence and Advanced analytics and Senior Consultant at ALG. 

Núria Alsina is MSc in Aeronautical Engineering and Postgraduate in Business Intelligence and Advanced analytics and Senior Manager at ALG.
For more insights, please check