Files to convert to WIN 866 (OEM Russian) <=> UTF-8

This forum is for eXpress++ general support.
Post Reply
Message
Author
User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#1 Post by Eugene Lutsenko »

As means Alaska can I convert files of WIN 866 (OEM Russian) <=> UTF-8?

I think I don't understand something. UTF-8 can be in DOS format (ASCII) and the Windows character set (ANSI):
Attachments
Без имени-1.jpg
Без имени-1.jpg (142.69 KiB) Viewed 22725 times

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#2 Post by Eugene Lutsenko »

Install the packages enca and recode in UNIX and tried the command allocated by the block on the given window. Everything was converted correctly. The team determined the current encoding of the file and convert it to UTF-8. To run UNIX commands from my program I make a bat file with the launch of bash and commands specifying paths. Everything works.

PS
in bash this is all elementary, but difficult to use it under windows
Attachments
Безымянный.jpg
Безымянный.jpg (91.53 KiB) Viewed 22721 times


User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#4 Post by Eugene Lutsenko »

http://www.mashkov.com/2015/01/22/%D0%B ... ault_1251/

Вопрос:
Как перекодировать файлы utf8, ascii, oem, UTF32, UTF7, BigEndianUnicode, Unicode, Default (Windows-1251) ?
Ответ:
Можно воспользоваться командой через powershell. Внизу пример как перекодировать файл utf8 в windows-1251, ANSI Cyrillic (кодировка операционной системы) в командной строке.

utf8 => 1251
C:\>powershell.exe "Get-Content -Encoding Unicode 'c:\text file.txt' | Out-File -Encoding Default 'c:\text file.txt.Default'"

1251 => utf8
C:\>powershell.exe "Get-Content -Encoding Default 'c:\text file.txt' | Out-File -Encoding Unicode 'c:\text file.txt.Default'"

Так же Вы можете перекодировать файлы по следующим кодировкам:
1. ascii;
2. BigEndianUnicode — UCS-2 Big Endian;
3. default — кодировка операционной системы, в России Windows-1251;
4. oem — OEM 866;
5. Unicode — UCS-2 Little Endian;
6. utf32;
7. utf7;
8. utf8.
Подробная информация по команде Get-Content for FileSystem на сайте разработчика
Идентификаторы различных кодировок — Code Page Identifiers на сайте разработчика
Так же для Вас может быть полезна страница запуск powershell сценария из командной строки

Victorio
Posts: 643
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#5 Post by Victorio »

Eugene,

Now I have not time to search prospects for coding ascii files, because I tested many softwares, utilities, when work on coding, decoding and test code pages of text files in my app.
but one from utility I use this is xcode.exe, in attach, but I cannot found web adresss from whitch this is, but I mean this is russian product :) that you must found it.
This utility can test and convert from to other code pages, also CP866 and W1251.
syntax is xcode -c -e %1 %1x
where -e is in English, that you do not put here and will be in russian.

when run with xcode -c -e zp807231oem.red zp807231oem.redx
that program write how code page is used in file ZP807231oem.red to file zp807231oem.redx
(red file is some txt file)
this is result : cp866: zp807231ansi.rec

Syntax for xcode is :
Usage: xcode -E -[hH?] -[wkaim1234567890] +[wkaim1234567890] [-q] [in [out]]
-E -h in English (don't forget to add -h or -H switch!)
-v print version information
-H manual, list of 14 encodings supported, and view YO-ware license
-d double recoding (try if simple 'xcode' failed)
-q quoted-printable decoding (useful for decoding MIME-files)
-l decode html Unicoded text (like &#1044;&#1080;&#1084;&#1072;)
-c determine encoding and print it to the output (see details by -H)
-t do unix2dos transformation (convert LF to CR/LF) in DOS/Win only
-p pipe mode (applies to DOS/Win environment only)
-s silent mode (no information on encodings displayed)
If input/output files are not specified, the standard input/output is used.
-a to set cp866 output (default)
-w to set cp1251 output
-k to set koi8-r output
-i to set iso8859-5 output
-m to set mac output
+a to force cp866 input
+w to force cp1251 input
+k to force koi8-r input
+i to force iso8859-5 input
+m to force mac input


Other utility is free converter PokludaCZ, http://www.pokluda.cz
also can run from command line :
czkonverze /00 /20 "zp807231oem.red" >vystup1.log
czkonverze /20 /00 "zp807231ansi.red" >vystup2.log

but this utility have only W1250 , not W1251 code page.

I have writed in Alaska only detector code page which test multiplicity some characters and then statistic count for what code page is this near.

Here some source , input parameter is some row from text:

**********************************
* DETEKTOR KÓDOVEJ STRĮNKY TEXTU *
**********************************
****************************
FUNCTION DETEKTORCP(riadok)
****************************

* zadefinovanie premennżch a po¾a znakov pre detekciu
Local pocet[7]

/*
Local detect := ;
{ "臟č‹Ćc", ;
"ų©żųŽŅr", ;
"šØē¹äÓs", ;
"˛‘§¾ģŚz", ;
"ó¢¢ó—Ļo", ;
"į  į‡Įa", ;
"é‚‚éˇ×e", ;
"ś££śœÕu", ;
"ķķ’Éi" ;
}
*/

Local detect := ;
{ "臟č‹Ćc", ;
"ų©żųŽŅr", ;
"šØē¹äÓs", ;
"˛‘§¾ģŚz", ;
"ó¢¢ó—Ļo", ;
"į  į‡Įa", ;
"é‚‚éˇ×e", ;
"ś££śœÕu", ;
"ķķ’Éi", ;
"Čķ’Éi", ;
"¼ķ’Éi", ;
"Šķ’Éi", ;
"¨ķ’Éi", ;
"ˇķ’Éi", ;
"Ļķ’Éi", ;
"żķ’Éi" ;
}


* vynulovanie počķtadla
for k=1 to 7
pocet[k]:=0
next

* cyklus pre načķtanie a otestovanie všetkżch znakov riadku
for i=1 to len(riadok)
* testujem iba znaky nad CHR(127)
if riadok>chr(127)
* skenujem 9 variantov znakov
* for j=1 to 9
for j=1 to 16
* testujem ka˛dż znak sady, v ka˛dej sade je 7 znakov
for k=1 to 7
if riadok==detect[j][k]
pocet[k]++
endif
next
next
endif
next

* tu vyhodnoti¯ ktorżch znakov je najviac pod¾a pocet[k]
*ladenie("pocet[1]"+str(pocet[1]))
*ladenie("pocet[2]"+str(pocet[2]))
*ladenie("pocet[3]"+str(pocet[3]))
*ladenie("pocet[4]"+str(pocet[4]))
*ladenie("pocet[5]"+str(pocet[5]))
*ladenie("pocet[6]"+str(pocet[6]))
*ladenie("pocet[7]"+str(pocet[7]))

/*
k=1
pompocet=pocet[k]
if pocet[2]>pompocet
pompocet=pocet[2]
k=2
endif
if pocet[3]>pompocet
pompocet=pocet[3]
k=3
endif
if pocet[4]>pompocet
pompocet=pocet[4]
k=4
endif
if pocet[5]>pompocet
pompocet=pocet[5]
k=5
endif
if pocet[6]>pompocet
pompocet=pocet[6]
k=6
endif
if pocet[7]>pompocet
pompocet=pocet[7]
k=7
endif
*/

* zatia¾ jednoduchšie vyhodnotenie lebo kompletné nedįva korektné vżsledky CP850/Win1250
if pocet[1]>0
kodstr=1250
else
kodstr=852
endif

ladenie("kódovį strįnka "+str(kodstr))

RETURN kodstr


Maybe some inspiration for you..

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#6 Post by Eugene Lutsenko »

Hey, Victoria!
Thank you very much!

I found where to download this console interpreter. Even a whole book about him. According to the book it looks like what you need. I see. But I need to recode them, 866 or 1251 to UTF8 and back. I need it because I use online translator (bash), which only works with files encoded in UTF8. And Alaska works in 866 (ASCII code) and 1251 (ANSI).

http://www.rusf.ru/books/yo/xcode.html
http://www.rusf.ru/books/yo/xcode.html#tth_sEc2

Source code ./src/xcodesrc.zip

The program is available under the following operating systems:

DOS ./bin/xcodedos.zip it is Recommended to copy the program into one of the directories in the PATH environment variable.

Win ./bin/xcodewin.zip we recommend that you copy the program in the %WINDOWS%\COMMAND (which often coincides with C:\WINDOWS\COMMAND). This version differs from the version for DOS and compiled as a win32 console application.

Unix ./bin/linux.zip Should work on all modern Linux distributions (the program was compiled under SuSE 8.1) ./bin/xcoderedhat71.zip for RedHat 7.1 (no longer supported, compile the source) ./bin/xcodesun.zip for Sun Solaris 8 (no longer supported, compile the source).

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#7 Post by Eugene Lutsenko »

I couldn't use xcode. It seems it should work with Unicode, but works incorrectly.

So try this:

Code: Select all

@echo off
@echo Translating in progress...
powershell.exe "Get-Content -Encoding ascii 'inp_1251.txt' | Out-File -Encoding utf8 'inp_utf8.txt'"
bash.exe -l -i trans -b ru:en -i c:\Aidos-X\cygwin\bin\inp_utf8.txt -o c:\Aidos-X\cygwin\bin\out_utf8.txt
powershell.exe "Get-Content -Encoding utf8 'out_utf8.txt' | Out-File -Encoding ascii 'out_1251.txt'"
@echo Translating is finished...
In the attached file in the folder bad result in automatic recoding. It is no good. Then I took and he had to scramble a source file in utf8 and done the translation. Everything turned out fine. The result in the folder is Good. Since the translation was made into English, the output file in utf8 and 1251 are no different.

Can You manage to find a transcoder that could be inserted in this bat-file to get it working correctly!

It would be nice if Alaska was recoding text files 866 and 1251 to utf8 and back!!!

PS
No viruses here like recoder:
http://kb.mista.ru/article.php?id=481
Attachments
Downloads.rar
(53.9 KiB) Downloaded 823 times

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#8 Post by Eugene Lutsenko »

All made very simple on UNIX: just three lines. Everything works. Will soon show.

But under WINDOWS all the time was not the same encoding, even though I clearly indicated what is necessary.

and under win I wrote a bat:

==============================
@echo off
@echo Translating in progress...

:: ANSI -> UTF-8
chcp 1251 > nul
cmd /u /c type inp1251.txt > tmp.txt
chcp 65001 > nul
type tmp.txt > inp65001.txt

bash.exe -l-i trans -b EN:en -i C:\Aidos-X\cygwin\bin\inp65001.txt -o C:\Aidos-X\cygwin\bin\out65001.txt

:: UTF-8 -> ANSI
chcp 65001 > nul
cmd /u /c type out65001.txt > tmp.txt
chcp 1251 > nul
type tmp.txt > out1251.txt

@echo Translating is finished...
==============================

However:

:: ANSI -> UTF-8
chcp 1251 > nul
cmd /u /c type inp1251.txt > tmp.txt
chcp 65001 > nul
type tmp.txt > inp65001.txt

encode the source file inp1251.txt not in utf8, and Unicode.
Although explicitly specified codepage 65001 utf8.
Last edited by Eugene Lutsenko on Wed Jan 03, 2018 2:36 am, edited 1 time in total.

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Files to convert to WIN 866 (OEM Russian) <=> UTF-8

#9 Post by Eugene Lutsenko »

Everything is done in UNIX (sygwin). Works perfectly. This. bat file:

@echo off
@echo Translating in progress...

enca -L russian inptrans.txt -x utf8
bash.exe -l -i trans -b ru:en -i C:\Aidos-X\cygwin\bin\inptrans.txt -o C:\Aidos-X\cygwin\bin\outtrans.txt
enca -L english outtrans.txt -x 1251

@echo Translating is finished...

Post Reply