Background: Various kinds of rule induction methods have been proposed, such as induction
from decision trees, decision lists, and the AQ family.
Several symbolic inductive learning methods have been proposed, such as the induction of decision
trees [1, 2, 3], and the AQ family [4, 5, 6]. These methods and many variants initially introduced in the
1980s and 1990s are useful for finding frequent patterns from databases. However, conventional rule
mining methods apply to a given dataset when the data has been fixed in the first run, but these methods
must run from scratch every time new data appears. Since the computational complexity is n2, a repeated
run would limit the applicability of these methods in the era of “Big Data”. To solve this problem,
incremental learning methods have been introduced. However, most of the methods have several problems:
First, they do not perform worse than conventional rule learning methods. Secondly, those methods
do not generate probabilistic rules. Third, computational complexity is heavier than conventional
Methods: By using a framework of the rough set rule induction model, the authors first investigate the
theoretical aspects of updates of statistical indices with additional examples used for rule selection criteria.
The authors have found four possibilities for the update of indices, which in turn lead to two new
rule selection criteria. If the statistical indices of a rule satisfy the first selection condition, the rule can
be used even if an additional example does not support the classification of the rules. If the statistical
indices of a rule satisfy the second pair of inequalities, the rule may be removed from the list of
classification rules in the above case, or the rule may be included in the list if an additional example
supports the classification. These rules belong to subrule layers. Based on rough set theory, we develop
a new rule induction method, called PRIMEROSE-INC5 (Probabilistic Rule Induction Method based on
Rough Sets for Incremental Learning Methods), which induces probabilistic rules incrementally.
Results: The system was evaluated based on the following two medical datasets, which were previously
used for evaluation on conventional rule induction methods. One dataset was on the differential diagnosis
of headaches, which consists of 1477 examples with 10 disease classes and 20 attributes. The other
dataset was on meningitis, which consists of 198 examples with 3 classes and 25 attributes. The system
was compared with other conventional rule induction methods by using repeated 10-fold crossvalidation
(repeated times: 100), whose experimental results showed that the proposed system outperformed
the previously introduced methods.