Python remove_diacritics Examples

Programming Language: Python

Namespace/Package Name: DAPOS.utils.norm.cleaner

Method/Function: remove_diacritics

Examples at hotexamples.com: 3

Python remove_diacritics - 3 examples found. These are the top rated real world Python examples of DAPOS.utils.norm.cleaner.remove_diacritics extracted from open source projects. You can rate examples to help us improve the quality of examples.

Example #1

Show file

File: test_cleaner.py Project: xmonader/DAPOS

 def test_remove_diacritics(self):
     self.assertEqual(remove_diacritics(u''), u'')
     self.assertEqual(remove_diacritics(u'بسم الله الرحمن الرحيم'),
                      u'بسم الله الرحمن الرحيم')
     self.assertEqual(
         remove_diacritics(
           u'لَا يُحِبُّ اللَّهُ الْجَهْرَ بِالسُّوءِ مِنَ ' \
           u'الْقَوْلِ إِلَّا مَنْ ظُلِمَ وَكَانَ اللَّهُ سَمِيعًا عَلِيمًا'
         ),
         u'لا يحب الله الجهر بالسوء من القول إلا من ظلم وكان الله سميعا عليما'
     )

Example #2

Show file

File: test_cleaner.py Project: myaser/DAPOS

 def test_remove_diacritics(self):
     self.assertEqual(remove_diacritics(u''), u'')
     self.assertEqual(
         remove_diacritics(u'بسم الله الرحمن الرحيم'),
         u'بسم الله الرحمن الرحيم'
     )
     self.assertEqual(
         remove_diacritics(
           u'لَا يُحِبُّ اللَّهُ الْجَهْرَ بِالسُّوءِ مِنَ ' \
           u'الْقَوْلِ إِلَّا مَنْ ظُلِمَ وَكَانَ اللَّهُ سَمِيعًا عَلِيمًا'
         ),
         u'لا يحب الله الجهر بالسوء من القول إلا من ظلم وكان الله سميعا عليما'
     )

Example #3

Show file

File: __init__.py Project: xmonader/DAPOS

def tokenize(text):
    '''convert raw text into list of tokens'''
    # TODO: get pre-tokenization in settings file
    text = remove_diacritics(text)
    pre_intervals = detect_special_tokens(text, special_tokens)

    word_breaker = ArabicWordBreakIterator()
    return word_breaker.analyse(text, pre_intervals)