Processing large database - acceleation

Message

Victorio · #1 Post by **Victorio** » Fri Oct 09, 2015 5:03 am

Hi,
In my application I processing database with 100000-200000 records. I am using FOXCDX index.
Processing is simple, in cycle "do while" are some mathematical operations , and reading information from other tables is this structure see bottom
This simple cycle consume lot of time. In example is only part of my function, I have several cycles, and processing database with 100000 records consume about 25 minutes ! on PC with core2duo processor. Processor is on 100% too , also on 4 core PC. On my notebook with W8.1 64 bit time processing is 50minutes and it is bad!!!.

Can I do something to speed ? Can help change processing dbf file to processing array and then write to dbf ?
Some time consume seek, then commands replace, append blank,...
Or DBF files (foxpro CDX index) have limits and can help change to other type of database ?

And what will be , if my app must processing database with 500000 record and more ? It will be unbearable.

Thanks

SELECT 1
use data1 index data1
SELECT 9
use data2 index data2

SELECT 1
set order to 1
go top
do while eof()!=.T.
...
rec=recno()
SELECT 9
set order to 3
seek something
if found()
y=y*10
else
y=0
endif
if rlock()
replace x with y,...
else
... warning..
endif
SELECT 1
go rec
skip
enddo

#2 Post by **rdonnay** » Fri Oct 09, 2015 5:23 am

Is this process done every day?

You could write a program that runs on a server and have it run automatically only at night.

You could use ADSDBE for SQL and do this in a single SQL statement. It would easily run 100 times faster.

You could set a relation to the seeked database and use the REPLACE X WITH (expression) FOR (expression). I'm not sure how much faster this would be but my guess would be about 3 to 10 times faster.

Tom · #3 Post by **Tom** » Fri Oct 09, 2015 5:29 am

Hi, Victorio.

1. Don't set the order again and again (SET ORDER TO 3). This consumes time. If several indexes are used for search, check if the order is already the right one (IF !IndexOrd()==nOrder ... SET ORDER ... do nothing). If only one index is used, set the order at the beginning.

2. You save the record number and jump back to that number even if the record pointer already is on that record. Avoid that. If no navigation in the main table is done elsewhere, erase the "DbGoto". If navigation maybe done, compare the record number before jumping.

3. Think about using a relation instead of a seek routine. If a relation is used, navigation in the child table is only done if the child table is accessed.

Victorio · #4 Post by **Victorio** » Fri Oct 09, 2015 6:02 am

Roger :

Process run only once per month, it is import and processing actually data on client PC. He must run this manually and wait until import done. The long wait is not good for the credibility of my application

ADSDBE and SQL I do not know

, must study it...

set relation ? must try...

Tom :
1 .yes, source need optimalization, set order i tryed clear, but small effect, about 7%.
I can examine Indexord()
2. yes, good tip first test, and when record jump to other position, than I need, use dbgoto.
3. relation,... can try.

Thank you very much .

skiman · #5 Post by **skiman** » Fri Oct 09, 2015 6:14 am

Hi,

Change

Code: Select all

SELECT 1
use data1 index data1
SELECT 9
use data2 index data2

SELECT 1
set order to 1
go top
do while eof()!=.T.
...
rec=recno()
SELECT 9
set order to 3
seek something
if found()
y=y*10
else
y=0
endif
if rlock()
replace x with y,...
else
... warning..
endif
SELECT 1
go rec
skip
enddo

to

Code: Select all

use data1 index data1 alias data1 new exclu
use data2 index data2 alias data2 new READONLY
data2->(ordnumber( name of tag ))

do while !data1->(eof())
...
data2->(dbseek( data1->something))
if data2->(found())
y:=y*10
data1->x := y
else
y:=0
...warning
endif
data1->(dbskip(1))
enddo

Using a relation could be faster, but I never use it myself.

Auge_Ohr · #6 Post by **Auge_Ohr** » Fri Oct 09, 2015 10:29 am

Victorio wrote:Can I do something to speed ?

hm ... did you try UPDATE ?

Code: Select all

   Use Data1 ALIAS Data1 index IndexX EXCLUSIVE
   Use Data2 ALIAS Data2 index IndexY EXCLUSIVE

   UPDATE ON Data1 FROM Data2 ; 
   IF(FOUND(),;
   REPLACE Data2->x WITH Data2->y*10,;
   REPLACE Data2->x WITH 0)

Victorio · #7 Post by **Victorio** » Fri Oct 09, 2015 12:15 pm

Jimmy: here I wrote only little example, real function is more complicated, here is :

*****************************
* FUNCTION FIDSPR1 * && pomocna funkcia pre spracovanie suboru identifikacii
*****************************
FUNCTION fidspr1

* pre pokusne tlacitko v nahlade
*pre pokus s toolbarom aj browser ak tu to nedám, potom preberá Getoptions z inej funkcie
Local GetList := {},cCakaj
Local GetOptions

* vyber etapy spracovania
menufid=1
DCMSGBOX "Vyber etapu spracovania identifikácií :" ;
TITLE "Dialóg" ;
FONT fontnt ;
BUTTONS {"1.etapa - Kontrola existencie C-parciel a E-parciel v ISKN ", ;
"2.etapa - Zistenie informacií z ISKN a zápis do IDF databázy" } ;
CHOICE @menufid

* Kontrola existencie C-parciel a E-parciel v ISKN
* zaroven sa zlucia identifikacie objektov rozdelenych roznymi bonitami
if menufid=1 .and. naozaj("Chcete naozaj spustiť kontrolu existencie C-parciel a E-parciel v ISKN ?")

* zobrazenie oznamu o spracovavani
cCakaj=DC_WaitOn("Kontrola existencie C-Parciel a E-parciel v ISKN...")

* zlucenie identifikacii z dovodu rozdelenia objektov E parciel
* roznymi bonitami
set order to 1
* zapamatanie udajov o aktualnom zazname
pomcpaci=cpaci
pomcpaei=cpaei
pompvei=pvei
pomcpac=cpac
pomcpae=cpae
pomcpue=cpue
pompveiu=pveiu
pomcpackh=cpackh
pomcpaekh=cpaekh
skip
do while eof()!=.T.
@14,17 dcsay "Zlucenie zaznamov identifikacii "+alltrim(str(recno()))+space(5)
* ak duplicitny zaznam (rovnake parcely C a E)
if cpac=pomcpac .and. cpae=pomcpae .and. cpue=pomcpue
rec=recno()
* vrati sa o zaznam vzad
skip -1
* zapamata prienikovu vymeru
dpompvei=pvei
dpompveiu=pveiu
* a vymaze zaznam
dbzmaz()
* nastavi sa spat na povodny zaznam a zapise tam
* suctovu vymeru
go rec
if reclock()
pom1=str(val(pvei)+val(dpompvei))
pom1=padl(pom1,14," ")
replace pvei with pom1
pom2=pveiu+dpompveiu
replace pveiu with pom2
else
rpozor()
endif
* posunie sa na dalsi zaznam
skip
* ak nieje duplicitny zaznam
else
* zapamatanie udajov o aktualnom zazname
pomcpaci=cpaci
pomcpaei=cpaei
pompvei=pvei
pomcpac=cpac
pomcpae=cpae
pomcpue=cpue
pompveiu=pveiu
pomcpackh=cpackh
pomcpaekh=cpaekh
* posunie sa na dalsi zaznam
skip
endif
enddo
* koniec zlucovania

* C-parcely
set order to 1
go top
do while eof()!=.T.
@14,17 dcsay "Kontrola existencie C-parciel "+alltrim(str(recno()))+space(5)
rec=recno()
pomcpa=cpac
SELECT 1
set order to 1
seek pomcpa
if found()
pomcpackh=.T.
else
pomcpackh=.F.
endif
SELECT 9
if reclock()
replace cpackh with pomcpackh
if pomcpackh=.F.
replace KN_VLA with "Chyba !!!, C-parcela nenajdena"
endif
else
rpozor()
endif
go rec
skip
enddo

* E/parcely
set order to 1
go top
do while eof()!=.T.
@14,17 dcsay "Kontrola existencie E-parciel "+alltrim(str(recno()))+space(5)
rec=recno()
pomcpa=cpae
pomcpu=cpue
SELECT 2
set order to 3
pomcpa=pomcpa*100+pomcpu
seek pomcpa
if found()
pomcpaekh=.T.
else
pomcpaekh=.F.
endif
SELECT 9
if reclock()
replace cpaekh with pomcpaekh
if pomcpaekh=.F.
replace KN_VLA with KN_VLA+" | Chyba !!!, E-parcela nenajdena"
endif
else
rpozor()
endif
go rec
skip
enddo

* tu dopĺňam pakovanie databázy, lebo problémy následne pri tlač.zostavách a indexe UNIQUE
*****************************************************
* doplnené 6.7.2015 - ešte ošetriť na prevádzku v sieti
* zmazanie duplicitných zrušených záznamov
* zatvorenie databázy identifikácií
close IDENT
* prepnutie do exclusive módu
set exclusive on
* znovuotvorenie databázy
SELECT 9
use &cestakn.&nazovdbf9. index &cestakn.&nazovdbf9. ALIAS IDENT
* PAKOVANIE
Dc_pack(.F.,"F")
* zatvorenie databázy
close IDENT
* prepnutie znova do exclusive off módu
set exclusive off
* znovuotvorenie databázy
SELECT 9
use &cestakn.&nazovdbf9. index &cestakn.&nazovdbf9. ALIAS IDENT

DC_Impl(cCakaj)

* Zistenie informacii z ISKN a zapis do IDF datab."
elseif menufid=2

if naozaj("Chcete naozaj spustiť zistenie informacií z ISKN a zápis do IDF databázy ?")

* zobrazenie oznamu o spracovavani
cCakaj=DC_WaitOn("Zistenie informacií z ISKN a zápis do IDF databázy")

* C-parcely
set order to 1
go top
do while eof()!=.T.
@14,17 dcsay "Zistovanie udajov o C-parcelach "+alltrim(str(recno()))+space(5)
rec=recno()
pomcpa=cpac
SELECT 1
set order to 1
seek pomcpa
if found()
pomdrpc=kn_drp
pomumpc=kn_ump
pomclvc=kn_clv
pomcelc=kn_cel
pomvymc=kn_vym

else
pomdrpc=0
pomumpc=0
pomclvc=0
pomcelc=0
pomvymc=0
endif
SELECT 9
if reclock()
replace drpc with pomdrpc
replace umpc with pomumpc
replace clvc with pomclvc
replace celc with pomcelc
replace vymc with pomvymc
else
rpozor()
endif
go rec
skip
enddo

* E-parcely
set order to 1
go top
do while eof()!=.T.
@14,17 dcsay "Zistovanie udajov o E-parcelach "+alltrim(str(recno()))+space(5)
rec=recno()
pomcpa=cpae
pomcpu=cpue
SELECT 2
set order to 4
pomcpa=pomcpa*100+pomcpu
seek pomcpa
if found()
pomdrpe=kn_drp
pomumpe=kn_ump
pomclve=kn_clv
pomcele=kn_cel
pomvyme=kn_vym

else
pomdrpe=0
pomumpe=0
pomclve=0
pomcele=0
pomvyme=0
endif
SELECT 9
if reclock()
replace drpe with pomdrpe
replace umpe with pomumpe
replace clve with pomclve
replace cele with pomcele
replace vyme with pomvyme
else
rpozor()
endif
go rec
skip
enddo

* Vlastnici
* algoritmus je taky, ze hlada vlastnika podla LV, ak
* najde, tak do ID databazy zapise udaje o nom, ak vsak
* je ich viac, musi zduplovat zaznam a do tohto pridat

* zotriedenie ID zakladne, podla cisla riadku
set order to 1
* nastavenie na zaciatok suboru
go top

* nastavenie pocitadla vlastnikov
pocetvl=0

* cyklus pre preskenovanie celeho suboru ID databazy
do while eof()!=.T.
@14,17 dcsay "Zistovanie udajov o Vlastnikoch-(ID) "+alltrim(str(recno()))+space(5)
recid=recno()
* zistenie cisla LV E-parcely
pomclve=clve

* zistenie udajov o nastavenom zazname ID
pomcpaci=CPACI
pomcpaei=CPAEI
pompvei=PVEI
pomcpac=CPAC
pomvymc=VYMC
pomdrpc=DRPC
pomclvc=CLVC
pomcelc=CELC
pomumpc=UMPC
pomcpae=CPAE
pomcpue=CPUE
pomvyme=VYME
pomdrpe=DRPE
pomclve=CLVE
pomcele=CELE
pomumpe=UMPE
pomcris=CRIS
pompveiu=PVEIU
pomcpackh=CPACKH
pomcpaekh=CPAEKH
pompveiu=PVEIU

* otvorenie suboru vlastnikov
SELECT 4
* zotriedenie podla cisla LV
set order to 1
* hladanie listu vlastnictva
seek pomclve
* nastatenie pocitadla vlastnikov na 0
pocetvl=0

* ak nasiel
if found()

* cyklus pre preskenovanie vsetkych vlastnikov
* daneho LV
do while kn_clv=pomclve
@14,17 dcsay "Zistovanie udajov o Vlastnikoch (VL) "+alltrim(str(recno()))+space(5)
pocetvl++
recvl=recno()

* zistenie udajov o vlastnikovi
pompcs=kn_pcs
pomcit=kn_cit
pommen=kn_men
pomkpv=kn_kpv
pomico=kn_ico
pomrci=kn_rci
pomvla=kn_vla
pomtuc=kn_tuc
pomtvl=kn_tvl

* vypocet podielovej vymery
pomvympod=(pomcit/pommen)*pomvyme

* vypocet podielovej vymery z prienikovej vymery
pomvympodpv=(pomcit/pommen)*pompveiu

* zapis udajov o vlastnikovi do databazy ID
SELECT 9
go recid

* ak prvy vlastnik
if kn_pcs==0
if reclock()
* zapis udajov o vlastnikovi
replace kn_pcs with pompcs
replace kn_cit with pomcit
replace kn_men with pommen
replace kn_kpv with pomkpv
replace kn_ico with pomico
replace kn_rci with pomrci
replace kn_vla with pomvla
replace kn_tvl with pomtvl
replace kn_tuc with pomtuc
replace vympod with pomvympod
replace vympodpv with pomvympodpv
else
rpozor()
endif
* ak druhy a dalsi vlastnik
else
append blank
recid=recno()

if reclock()
* zapis udajov o aktualnom ID
replace cpaci with pomCPACI
replace cpaei with pomCPAEI
replace pvei with pomPVEI
replace cpac with pomCPAC
replace vymc with pomVYMC
replace drpc with pomDRPC
replace clvc with pomCLVC
replace celc with pomCELC
replace umpc with pomUMPC
replace cpae with pomCPAE
replace cpue with pomCPUE
replace vyme with pomVYME
replace drpe with pomDRPE
replace clve with pomCLVE
replace cele with pomCELE
replace umpe with pomUMPE
replace cris with pomCRIS
replace pveiu with pomPVEIU
replace cpackh with pomCPACKH
replace cpaekh with pomCPAEKH

* zapis udajov o vlastnikovi
replace kn_pcs with pompcs
replace kn_cit with pomcit
replace kn_men with pommen
replace kn_kpv with pomkpv
replace kn_ico with pomico
replace kn_rci with pomrci
replace kn_vla with pomvla
replace kn_tuc with pomtuc
replace kn_tvl with pomtvl
replace vympod with pomvympod
replace vympodpv with pomvympodpv

else
rpozor()
endif

endif
SELECT 4
go recvl
skip
enddo

* ak nenasiel
else
*pompcs=0
*pomcit=0
*pommen=0
*pomkpv=0
*pomico=0
*pomrci=0
*pomvla=space(0)
*pomtvl=0

endif
*
SELECT 9
go recid
skip
enddo
*

DC_Impl(cCakaj)

endif

* DC_Impl(cCakaj)

endif
*

DC_GetDestroy(GetList)
DC_ClearEvents()

set order to 1
go top

return(0)

Victorio · #8 Post by **Victorio** » Fri Oct 09, 2015 12:36 pm

First I must optimalize some operations with set order and recno(), and put several replace command to one.

Then will try other things so wrote Roger and Tom (relation and other). Maybe SQL is right way...

But I must start work on new project , which will be work with large databases, about 200GB

, and I do not know what wait for me.
It will be no one database , but about 2000 DBF files, Here I need multicore processing, or some virtual servers to processing data. Processing data will by only once, than can running 24 hours several days, I hope no weeks

After processing must be everything indexed and in relations.
On my old DOS application in CA Clipper I do not use DBF, but converted TXT ASCII file zipped with special Lempel ziv algoritm and for processing, searching is used C++ external utilities. This program worked about 7-10x speed than same program in FoxPro.

Auge_Ohr · #9 Post by **Auge_Ohr** » Fri Oct 09, 2015 1:19 pm

Victorio wrote:On my old DOS application in CA Clipper I do not use DBF, but converted TXT ASCII file zipped with special Lempel ziv algoritm and for processing, searching is used C++ external utilities. This program worked about 7-10x speed than same program in FoxPro.

you should 1st say what you have todo ...

read TXT into Array up to 2GB and compare Array against DBF.

Victorio · #10 Post by **Victorio** » Sat Oct 10, 2015 11:36 am

Jimmy, processing txt file I mean my new project, what I want reprogramming from Ca Clipper.
Problem with optimalization is in existing application, but I examine writed things... I am sure information from you, Tom and Roger help me.

In new project I have many databases, in DBF, FPT (FoxPro 2.x format.) about 2000 files. Size of databases is from 0B to about 500-800MB.
In this databases need search variable of informations, numbers, text and generating some report.
Sometime searching is only in one, two database, sometimes in every, atc.
If I make this with DBF,FPT, I need big capacity of data disk, as I sad about 200-300GB, and searching in this data will be very slow.

Because this I programmed utility , what convert this DBF files to ASCII file, and packed and crypted, after converting size of files is about 1/10 source DBF !.
And then I am searching in this files .
Packer , crypter and search "machine" is programmed in C language, then it is very fast, about 7 times then search in source DBF files.
This was true in CA Clipper for DOS compared with FoxPRo 2.x

I do not know , how it will behave in Alaska Xbase

Then I must programming example to testing.
I know, it is very "primitive" system, but works good 20years, although in DOS but work also on Windows XP, 7,... so only on 1/100 size of data.
New application must be faster then dos CA Clipper applicaton.

OK, this is only for clarification.

Viktor

bb.donnay-software.com

Processing large database - acceleation

Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation

Re: Processing large database - acceleation