Background: The number of pedagogic web page created on the web, which corresponds
to number of courses and exercises available on the web, exceeds the number of published books
each year. These web documents are often too long to be easy to read, especially when important
information is dispersed in various parts and often definite in a more or less formal way. Therefore,
they must be described with machine-readable data, otherwise they become unusable and impossible
to find. The main objectives of this paper are enhancing information sharing, improving trade and
increasing interoperability on the web. Recently, few patents on semantic annotations have been published.
Indeed, with the great mass of data managed throughout the world and especially with the
development of the web towards semantic Web where annotations are associated with all types of
documents on the web, the selection of annotation has become an important criterion in research step.
In this article, we focus on the annotation of Web documents and validation of this annotation.
Methods: The keywords representing the page are defined and tagged with the concepts of ontology.
The words that are components of the annotation are determined from a mixed analysis: calculating
the degree of similarity and the frequency. When inconsistencies are detected, the annotation is revised
in a revised module.
Results: The results obtained are very encouraging, which shows the importance of our validation
module after the merger of the two annotation techniques in extraction of keywords.
This validation creates an act of trust between the annotation systems and the search engines that take
on the annotations created.
Conclusion: The extraction of the words used in the annotation is a very important factor which gives
a fair presentation to the documents in question. Once the annotation is made, the validation tests of
stage make these consistent annotations ready to be consumed by the search engines.