String compare

Message

Victorio · #1 Post by **Victorio** » Wed Sep 11, 2019 9:01 am

Hi

In my application I am experimenting with several comparing methods as Like, =, == at, $ etc.

Exist some diferencies when comare many many strings (milions rows) that I search method which is better for use.

For example like(text1,text2) is slower than at(text1,text2)>0 and this also slower than text1 $ text2
when compare about 10000000x then time to processing is 2.6 seconds or 1.81 seconds or 1.52 seconds.

If my processes running several hours this time is important and better method spare some time for me.

Exist some better way to comparing text that I shown ?

Also I experimented with compare by parts, first compare first letter, and only when this identical then compare full string, here are also some seconds to spare, but still exist better solution

or divide searched key to letters and compare as letter or ascii numbers, but this is not effective.

If somebody have some advice, I will be gratefull.

Note : I comparing strings in beginning row, but also in any place in row., still in one row, this is not full text searching. Text have own format, with some sign in begin rows etc.
function read rows in cycle to all text file and compare key text if is it in this row , by the way text file is loaded in array.

text can be as this :
searched text ABCDE
row : bla bla bla .... ABCDE bla bla bla

or
row : ABCDE bla bla bla...

Has some function better performance as $ ?

#2 Post by **rdonnay** » Wed Sep 11, 2019 11:49 am

Are you using a client server such as ADS or Sql Server?

Auge_Ohr · #3 Post by **Auge_Ohr** » Wed Sep 11, 2019 1:17 pm

hi,

Victorio wrote:by the way text file is loaded in array.

so where is Text Original from

File or Memo

---

do you want to make a "full text search" in Database

i use Xbase++ "Custom Index" for FTS in Memo

Victorio · #4 Post by **Victorio** » Wed Sep 11, 2019 2:08 pm

Original text is in text ascii file. This read to string variable and then read from this variable by rows where I searching EOL to know where row ended.
File are sometimes very large hudrets MBytes then Iust divide to 100MB parts and process separate.
I am searching only method for better performance whem comparing strings.
Other method as store to ads and other need tottaly reprogramming.
Maybe routine in C, C++ cam be solution or remove text searching module to external C++ module as I had many years ago whem application was in Ca Clipper.
Or create xbaseodule but not gui to better performance and call it as external module with runshell but this is not very clear

#5 Post by **rdonnay** » Wed Sep 11, 2019 2:25 pm

If you are using a client server in your app then a SQL SELECT statement is much faster than a locate or set filter.

Auge_Ohr · #6 Post by **Auge_Ohr** » Wed Sep 11, 2019 2:41 pm

Victorio wrote:Original text is in text ascii file.

have you think about "regular Expression"

there is a Sample from Phil Ide

XbPCRE - PCRE (Perl Compatible Regular Expression) Library for Xbase++

xbpcre17.zip: need rebuild LIB from PRG for > v1.81; (300.11 KiB) Downloaded 737 times

Victorio · #7 Post by **Victorio** » Thu Sep 12, 2019 1:02 am

rdonnay wrote:If you are using a client server in your app then a SQL SELECT statement is much faster than a locate or set filter.

Roger, no , I am not using client server. (or at this moment this is not important, because I need optimize low level text processing)
I work simply with text files stored on disk.
I have some text keys, which I search in this text, this keys can be 1, 2 or thousands.
And I compare every rows this text file with all keys, if are identical, or if key is somewhere in this row.
When text file has 1000000 rows, and compare 1 key, time is some seconds.
But when keys are 10000 , need 1000000x10000 comparings and this requires minutes or hours.
Because I need eliminate any useless operations and any millisecond is important for me.

Jimmy: I will look for this sample Thanks

skiman · #8 Post by **skiman** » Thu Sep 12, 2019 3:33 am

Hi,

Do you need to know which text files include your key, or do you need the row in the text file?

If there are multiple keys, do you need to know if ALL keys are in the text, or if ONE of the keys is in the text?

Can you post a sample text file and sample keys to search for?

Victorio · #9 Post by **Victorio** » Thu Sep 12, 2019 4:28 am

skiman wrote:Hi,

Do you need to know which text files include your key, or do you need the row in the text file?

here is sample
Code: Select all
================================================================================
| POLOZKA VZ : 1 / 2000, riadok :3 |     03.01.2000 o 11:15: 4| Kod :24342
==================================
VLASTNICI PARCIEL -- ZRUSENIE -- LIST VLASTNICTVA c: 1286 , spoluvl.: 1
Cislo LV                                Stare: 1286
Por.c.spoluvl.                          Stare: 1
Citatel vlast.pod.                      Stare: 1
Menovatel vlast.pod.                    Stare: 1
Polozka VZ                              Stare: 100
Meno,adr.vlastnika
Stare: pokusny testovaci zaznam po 1.1.2000
Kontr.kod                               Stare: 14141
================================================================================
| POLOZKA VZ : 2 / 2000, riadok :7 |     02.02.2000 o 7:40:46| Kod :252
==================================
VLASTNICI PARCIEL -- ZRUSENIE -- LIST VLASTNICTVA c: 219 , spoluvl.: 1
Cislo LV                                Stare: 219
Por.c.spoluvl.                          Stare: 1
Citatel vlast.pod.                      Stare: 6
Menovatel vlast.pod.                    Stare: 9
Polozka VZ                              Stare: 3199
Typ identifikatora                      Stare: 3
ICO ,rod.c.                             Stare: 19365817
Meno,adr.vlastnika
Stare: SMOTER LADISLAV A MARIA R LENCESOVA KOSICE
Kontr.kod                               Stare: 53722
In text file are blocks beginning with "| POLOZKA VZ ", this is one of control "words" by this I know where block begin and ended where
"================================================================================"

when reading, to temporary variable I save block rows, beginning with POLOZKA... and ending with ======
now I am searching text and in this block search if content any from x keys,
if found key in any row this block I save full this block to out and go read another block.

If there are multiple keys, do you need to know if ALL keys are in the text, or if ONE of the keys is in the text?

I am using combination, all keys and sometimes one of the keys, or sometimes are keys in pair , and must found both this pair and all or any from .

Can you post a sample text file and sample keys to search for?

Sample keys can be random, sometimes use formatted keys as :
* "ID.CISLO STAVBY c: "+alltrim(str(ICS))
* "ID.C.PRAVNEHO VZTAHU : "+alltrim(str(IDC))
* "ID.CISLO PRIESTORU c: "+alltrim(str(ICP))
* "C-PARCELY -- AKTUALIZACIA -- PARCELA c: "+alltrim(prevpar5(str(CPA)))
* "E-PARCELY -- AKTUALIZACIA -- PARCELA c: "+alltrim(prevpar5(str(CPA)))
* "E-PARCELY -- AKTUALIZACIA -- PARCELA c: "+alltrim(prevpar5(str(CPA)))+" , Umiest.: "+alltrim(str(CPU))

but sometimes searched text can be any word, number, or combination of letters, numbers, and any signs.

bb.donnay-software.com

String compare

String compare

Re: String compare

Re: String compare

Re: String compare

Re: String compare

Re: String compare

Re: String compare

Re: String compare

Re: String compare