Background: To identify new bacterial type III secreted effectors is computationally a big
challenge. At least a dozen machine learning algorithms have been developed, but so far have only
achieved limited success. Sequence similarity appears important for biologists but is frequently
neglected by algorithm developers for effector prediction, although large success was achieved in the
field with this strategy a decade ago.
Objective: The study aimed to develop a sequence similarity based effector prediction tool.
Method: In this study, we propose a recursive sequence alignment strategy with Hidden Markov
Models, to comprehensively find homologs of known YopJ/P full-length proteins, effector domains and
N-terminal signal sequences.
Results: Using this method, we identified 155 different YopJ/P-family effectors and 59 proteins with
YopJ/P N-terminal signal sequences from 27 genera and more than 70 species. Among these genera, we
also identified one type III secretion system (T3SS) from Uliginosibacterium and two T3SSs from
Rhizobacter for the first time. Higher conservation of effector domains, N-terminal fusion of signal
sequences to other effectors, and the exchange of N-terminal signal sequences between different
effector proteins were frequently observed for YopJ/P-family proteins. This made it feasible to identify
new effectors based on separate similarity screening for the N-terminal signal peptides and the effector
domains of known effectors. This method can also be applied to search for homologues of other known
Conclusion: A new sequence alignment based method was developed, which could effectively facilitate
the identification of new T3SS effectors.