प्रयोगकर्ता वार्ता:RajeshPandey/Nepali Wikipedia Translator
Google translation toolkit[सम्पादन गर्ने]
Google translation toolkit seems to be working to some extent. I had earlier tried to use Google's toolkit. I uploaded some glossaries. Later I became frustrated from the google translation, and started developing my own translation software. Today I use it a lot. See: Nepali Wikipedia Translator.
After months, I tried to use Google translate toolkit to translate to Hindi, so that I could translate it back to Nepali again. This time I found that the toolkit was able to translate directly to Nepali from English although it was not perfect. It did gave some results and that was what I wanted at that time. I hope they will keep on updating their toolkit and make it a bit more intelligent in the future. I just posted here to give the good news, and we might use the translation tool for translating articles directly from English Wikipedia or any other websites. --राजेश ०८:४७, १७ फेब्रुअरी २०११ (UTC)
If Google continues to improve itself and work on Nepali language as well, I will not need to bother working on a translation tool of my own. Although it was fun and It helped me create many articles in Wikipedia. I used to convert to Hindi before I could translate them to Nepali. Now I guess it will be a lot more simpler. Lets hope for the best, then I will take a rest.
तपाइको प्रयास राम्रो लाग्यो राजेश् जि, keep it up! - बिकास १९:४४, २८ फेब्रुअरी २०११ (UTC)
- Thank you Bikash :) --राजेश १९:०६, ७ मार्च २०११ (UTC)
haina hijo aja kasta dacter hunehun bhanda po pir lagera auxau
Corpus of 10M words[सम्पादन गर्ने]
I recently talked with some guys for developing a translation software for Nepali. They said that a corpus of 10M words is one of the minimal requirement for a basic translator. It will be okay if we have 100M words. I thought the requirement of 10 million words was really huge. I still thought that a rule based translator could do "enough" translation without such a huge corpus. We still can create a corpus using the rules. It would be really good if we could have such a huge corpus. I tried to download corpus from different sources, but most of them were not available. They have it but they won' t give us.
I tried to get a hindi corpus and translate it to Nepali because it would be easier for me, IIIT Khadagpur, Kathmandu University and the ltk might have it. I managed to get from IITB while searching for a corpus in Hindi. I tried to check the performance of this translator. I found some other challenges.
Some of the Hindi words(conjunctions) have dual meanings in Nepali.
For example the word "se" ("से") has multiple meanings in Nepali : "देखि"/dekhi(from), "बाट"/bata(from), etc. These depend on the sentence and the subject that is attached with it. English has some distinctions such as "since/from" etc. In such scenarios it would be ideal to translate from English to Nepali directly. However in general when the text is translated to Hindi, it contains "se" which will be harder to get the actual context and the actual meaning. For the same reason I have been skipping translations for "se/से","mey/में" and "per/पर".
For example: The "from" in the following sentence should get translated to "se" in Hindi or "bata" in Nepali.
I have traveled from London to Washington.
will have "se" in Hindi as :
maine London se Washington tak travel kiya hay. / मैने लन्डन से वासिङ्गटन तक ट्रावल किया है।
which will have "bata" in Nepali as:
mailey London bata Washington samma travel gareyko chhu. / मैले लन्डन बाट वासिङ्गटन सम्म ट्राभल गरेको छु।
Lets take an another translation example :
I have been travelling since I was five.
will have "se" in Hindi as :
main paanch saal se travel karti aayi hoon. / मै पाँच साल से ट्रावल कर्ति आइ हुँ।
This will have "dekhi" in Nepali as
man paanch barsa dekhi travel gardai aayi rakheko chhu. / म पाँच बर्ष देखि ट्राभल गर्दै आइ रहेको छु।
So it depends on- whether the sentence actually had "since" or "from". But it is easier in Hindi to use "se" for both of them. But when we are talking about translating to Nepali we need to find whether it actually meant "since" or "from" in English and thus put the right conjunction in it.
- I found that the translator became slower because of the increased dictionary.
I kept on updating the words every week or so. I committed them in the svn and published it in the wiki. I still hope that one day there will be enough people interested in it and they will find it useful. In the mean time the translator was really helpful to me and I was happy with it. Whenever I got anything in Hindi and I wanted to translate, I could do it easily. I felt like it saved my 80% of my translation time using this software. Although there might have been drawbacks of automated machine translations, I never cared much. Sometimes I can't get the actual meaning of the original text. So I have to refer to the original text, and that is why I have a preview tab in the software where I can compare what it actually meant and what the software actually translated. eg:
Adding English[सम्पादन गर्ने]
When the water in the sacred-bowl of the temple begun to tremble, it chilled down my spine and I got the Goosebumps.
2. I was in the Kali's temple for a short prayer on my way to my Grandmother. I had some ripe mangoes with me for my Grandmother. She loved mangos and she could chew them even she had only three teeth.
3. I had been in Kali's temple for many times before; but Such a clear sign of the goddess’ presence was never seen. It was merely scaring than consoling.
Wikifunctions: Diff[सम्पादन गर्ने]
AutoWikiBrowser's Diff function from the wikifunctions was really cool, I read the archives about how they developed the diff algorithm back in 2007. I was using Dotnetwikibot for fetching the articles from other wikis. Though that should not require the logging in functionality, it was only readonly, but I had no choices, I preferred to provide username and password to "DotnetwikiBot.cs" and it would get my article from any wiki (mostly नेपाली: Ne, अङ्ग्रेजी: en, हिन्दी: hi )
Later I replaced dotnetwikibot with wikifunctions and it worked perfectly.