Background: Parsing English language achieved effective results over the last few decades.
However, parsing a difficult language such as Arabic represents a major challenge at the present, since
it is characterized by the rich morphology and contains complex linguistic characteristics not found in
other languages. Although parsing systems for Arabic have been developed recently, however, most of
them do not support any deeper processing for the Arabic sentences such as providing an effective dependencies
analysis to identify, for example, the relative clauses in these sentences.
Objective: This paper develops a new framework and system that support the process of parsing Arabic
sentences and writing well-formed Arabic relative clauses.
Method: The developed framework is applied to learn the grammar rules for Arabic relative clauses
based on the use of machine learning, in particular, Inductive Logic Programming (ILP). A corpus
of Arabic relative sentences was generated from Quran and used in the experiments made in this research.
The sentences in this corpus were firstly processed by using the Natural Language Processing
(NLP) toolkit called Stanford coreNLP and then given to the ILP system ALEPH to automatically
learn a grammar for Arabic relative clauses. A system was developed to extract Arabic
relative clauses from Arabic sentences based on the rules produced by ALEPH.
Results: An empirical evaluation of the developed system was carried out and achieved promising
results with an overall accuracy of 83%.
Conclusion: Our results lead to conclude that the developed system is able to perform a deeper dependency
parsing for Arabic text as well as it can identify relative clauses in Arabic sentences.