String compare

This forum is for eXpress++ general support.
Post Reply
Message
Author
Victorio
Posts: 620
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

String compare

#1 Post by Victorio »

Hi

In my application I am experimenting with several comparing methods as Like, =, == at, $ etc.

Exist some diferencies when comare many many strings (milions rows) that I search method which is better for use.

For example like(text1,text2) is slower than at(text1,text2)>0 and this also slower than text1 $ text2
when compare about 10000000x then time to processing is 2.6 seconds or 1.81 seconds or 1.52 seconds.

If my processes running several hours this time is important and better method spare some time for me.

Exist some better way to comparing text that I shown ?

Also I experimented with compare by parts, first compare first letter, and only when this identical then compare full string, here are also some seconds to spare, but still exist better solution :)
or divide searched key to letters and compare as letter or ascii numbers, but this is not effective.

If somebody have some advice, I will be gratefull.

Note : I comparing strings in beginning row, but also in any place in row., still in one row, this is not full text searching. Text have own format, with some sign in begin rows etc.
function read rows in cycle to all text file and compare key text if is it in this row , by the way text file is loaded in array.

text can be as this :
searched text ABCDE
row : bla bla bla .... ABCDE bla bla bla

or
row : ABCDE bla bla bla...

Has some function better performance as $ ?

User avatar
rdonnay
Site Admin
Posts: 4722
Joined: Wed Jan 27, 2010 6:58 pm
Location: Boise, Idaho USA
Contact:

Re: String compare

#2 Post by rdonnay »

Are you using a client server such as ADS or Sql Server?
The eXpress train is coming - and it has more cars.

User avatar
Auge_Ohr
Posts: 1405
Joined: Wed Feb 24, 2010 3:44 pm

Re: String compare

#3 Post by Auge_Ohr »

hi,
Victorio wrote:by the way text file is loaded in array.
so where is Text Original from :?:
File or Memo :?:

---

do you want to make a "full text search" in Database :?:
i use Xbase++ "Custom Index" for FTS in Memo
greetings by OHR
Jimmy

Victorio
Posts: 620
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: String compare

#4 Post by Victorio »

Original text is in text ascii file. This read to string variable and then read from this variable by rows where I searching EOL to know where row ended.
File are sometimes very large hudrets MBytes then Iust divide to 100MB parts and process separate.
I am searching only method for better performance whem comparing strings.
Other method as store to ads and other need tottaly reprogramming.
Maybe routine in C, C++ cam be solution or remove text searching module to external C++ module as I had many years ago whem application was in Ca Clipper.
Or create xbaseodule but not gui to better performance and call it as external module with runshell but this is not very clear

User avatar
rdonnay
Site Admin
Posts: 4722
Joined: Wed Jan 27, 2010 6:58 pm
Location: Boise, Idaho USA
Contact:

Re: String compare

#5 Post by rdonnay »

If you are using a client server in your app then a SQL SELECT statement is much faster than a locate or set filter.
The eXpress train is coming - and it has more cars.

User avatar
Auge_Ohr
Posts: 1405
Joined: Wed Feb 24, 2010 3:44 pm

Re: String compare

#6 Post by Auge_Ohr »

Victorio wrote:Original text is in text ascii file.
have you think about "regular Expression"

there is a Sample from Phil Ide
XbPCRE - PCRE (Perl Compatible Regular Expression) Library for Xbase++
xbpcre17.zip
need rebuild LIB from PRG for > v1.81
(300.11 KiB) Downloaded 709 times
greetings by OHR
Jimmy

Victorio
Posts: 620
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: String compare

#7 Post by Victorio »

rdonnay wrote:If you are using a client server in your app then a SQL SELECT statement is much faster than a locate or set filter.
Roger, no , I am not using client server. (or at this moment this is not important, because I need optimize low level text processing)
I work simply with text files stored on disk.
I have some text keys, which I search in this text, this keys can be 1, 2 or thousands.
And I compare every rows this text file with all keys, if are identical, or if key is somewhere in this row.
When text file has 1000000 rows, and compare 1 key, time is some seconds.
But when keys are 10000 , need 1000000x10000 comparings and this requires minutes or hours.
Because I need eliminate any useless operations and any millisecond is important for me.

Jimmy: I will look for this sample Thanks

skiman
Posts: 1183
Joined: Thu Jan 28, 2010 1:22 am
Location: Sijsele, Belgium
Contact:

Re: String compare

#8 Post by skiman »

Hi,

Do you need to know which text files include your key, or do you need the row in the text file?

If there are multiple keys, do you need to know if ALL keys are in the text, or if ONE of the keys is in the text?

Can you post a sample text file and sample keys to search for?
Best regards,

Chris.
www.aboservice.be

Victorio
Posts: 620
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: String compare

#9 Post by Victorio »

skiman wrote:Hi,

Do you need to know which text files include your key, or do you need the row in the text file?

here is sample

Code: Select all

================================================================================
| POLOZKA VZ : 1 / 2000, riadok :3 |     03.01.2000 o 11:15: 4| Kod :24342
==================================
VLASTNICI PARCIEL -- ZRUSENIE -- LIST VLASTNICTVA c: 1286 , spoluvl.: 1
Cislo LV                                Stare: 1286
Por.c.spoluvl.                          Stare: 1
Citatel vlast.pod.                      Stare: 1
Menovatel vlast.pod.                    Stare: 1
Polozka VZ                              Stare: 100
Meno,adr.vlastnika
Stare: pokusny testovaci zaznam po 1.1.2000
Kontr.kod                               Stare: 14141
================================================================================
| POLOZKA VZ : 2 / 2000, riadok :7 |     02.02.2000 o 7:40:46| Kod :252
==================================
VLASTNICI PARCIEL -- ZRUSENIE -- LIST VLASTNICTVA c: 219 , spoluvl.: 1
Cislo LV                                Stare: 219
Por.c.spoluvl.                          Stare: 1
Citatel vlast.pod.                      Stare: 6
Menovatel vlast.pod.                    Stare: 9
Polozka VZ                              Stare: 3199
Typ identifikatora                      Stare: 3
ICO ,rod.c.                             Stare: 19365817
Meno,adr.vlastnika
Stare: SMOTER LADISLAV A MARIA R LENCESOVA KOSICE
Kontr.kod                               Stare: 53722

In text file are blocks beginning with "| POLOZKA VZ ", this is one of control "words" by this I know where block begin and ended where
"================================================================================"

when reading, to temporary variable I save block rows, beginning with POLOZKA... and ending with ======
now I am searching text and in this block search if content any from x keys,
if found key in any row this block I save full this block to out and go read another block.

If there are multiple keys, do you need to know if ALL keys are in the text, or if ONE of the keys is in the text?

I am using combination, all keys and sometimes one of the keys, or sometimes are keys in pair , and must found both this pair and all or any from .

Can you post a sample text file and sample keys to search for?
Sample keys can be random, sometimes use formatted keys as :
* "ID.CISLO STAVBY c: "+alltrim(str(ICS))
* "ID.C.PRAVNEHO VZTAHU : "+alltrim(str(IDC))
* "ID.CISLO PRIESTORU c: "+alltrim(str(ICP))
* "C-PARCELY -- AKTUALIZACIA -- PARCELA c: "+alltrim(prevpar5(str(CPA)))
* "E-PARCELY -- AKTUALIZACIA -- PARCELA c: "+alltrim(prevpar5(str(CPA)))
* "E-PARCELY -- AKTUALIZACIA -- PARCELA c: "+alltrim(prevpar5(str(CPA)))+" , Umiest.: "+alltrim(str(CPU))

but sometimes searched text can be any word, number, or combination of letters, numbers, and any signs.

Post Reply