Page 2 of 2

Re: Excel remove duplicates

Posted: Tue Feb 27, 2024 5:17 am
by Victorio
Ok, understand, thanks.

Re: Excel remove duplicates

Posted: Tue Feb 27, 2024 8:09 am
by Victorio
My plan to sort records in DBF via INDEX UNIQUE failed because my index key is longer than 240 characters,
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number

For example i tryed this :

x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)

or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)

still get error Invalid index

Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?

Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo

Re: Excel remove duplicates

Posted: Wed Feb 28, 2024 1:13 am
by skiman
Hi Victorio,

I was wondering what you want to do? It can't be that problem with Xbase++.

There are 34 colums/fields which you want to test if they are duplicated? Do you want to check if all fields are the same, or if ONE of these fields is the same?

An indexkey can't be that long, but you could use an array, depending on the size of the file. If you create an array with { recno() , field1+field2+... } then you can search for duplicates in the array. When you find a duplicate you have the recno() to process as you want.

Is you data in a dbf at start or is it an xls you want to process? Maybe post a sample of your data?

Re: Excel remove duplicates

Posted: Wed Feb 28, 2024 1:42 am
by Victorio
Hi Chris,
At the beginning I have a DBF (Visual Foxpro structure) and later I generate XLSX from it
I originally wanted to solve it through Excel, but I don't know because of the problem.
I need to delete records that have the same 35 out of 36 fields.
I wanted to avoid working with arrays because I don't know how fast it will be with large databases.
But yesterday I tried to program a different way using DBF using Var2Bin() , it looks like it should work, but I'm just going to test it:

Code: Select all

SELECT 1
use d:\#POKUSY\TEST_Excel\POKUS2 EXCLUSIVE
go top
do while !eof()
	replace unik with .T.	// 
	skip
enddo

go top
prvy:=1
duplicit:=0
lastrec:=""
for i=1 to 36
	lastrec:=lastrec+Var2Bin(FieldGet(i))	
next

do while !eof()
	aktrec:=""
	for i=1 to 36
		aktrec:=aktrec+Var2Bin(FieldGet(i))
	next
	if aktrec==lastrec .and. prvy!=1
		replace unik with .F.	
		duplicit++
	endif
	if prvy=1
		prvy=0
	endif
	lastrec:=aktrec
	skip
enddo
close all

So, this is only first try without code optimization

Re: Excel remove duplicates

Posted: Wed Feb 28, 2024 6:14 am
by SlavkoDam
Hi Viktor,

Since you want to compare 35 out of 36 fields for equality, my Excel code can be modified not to compare each column values as it is now, but to compare values of entire rows, or values of a range of columns. So, there will be only one compare statement per each row, and no need to sum values of columns in the range, as in your test program. I believe you test program will work too.

Slavko

Re: Excel remove duplicates

Posted: Mon Mar 04, 2024 1:12 am
by k-insis
How many chars?!

As it is only for unique, you should change method how you determine uniqueness of some record data.

Put everything into expression into function and calculate SHA (hash value) of fields you want to account for unique.
Hash functions are extremely fast in any but musem grade computers.

FUNCTION MySomethingUq()
LOCAL cSomething := ""
LOCAL cHash := ""

//concat long string
cSomething := SOMEFIELD+SOMEFIELD2+DTOS(SOMEDATE)+STRZERO(SOMENUM,15,4)
cHash := Char2Hash( cSomething, 256 ) //you might use 128 too as this is only for comparison not security use

RETURN cHash

DBUseArea(.t.,,"somefile","hashwoes",.t.,.t.)
hashwoes->(dbcreateindex("someuniq","MySomethingUq()",{|| hashwoes->(MySomethingUq()) }, .T.))



Links:

https://doc.alaska-software.com/content ... 2hash.html


Victorio wrote: Tue Feb 27, 2024 8:09 am My plan to sort records in DBF via INDEX UNIQUE failed because my index key is longer than 240 characters,
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number

For example i tryed this :

x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)

or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)

still get error Invalid index

Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?

Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo

Re: Excel remove duplicates

Posted: Mon Mar 04, 2024 3:18 am
by Victorio
Yes, i think about use HASH, but now works fine also with Var2Bin without problems.
First I sort DBF to other temporary DBF, then in cycle delete sign duplicates, then delete it,
and final i append to new database.
When I use PACK, program often crashed, because I use append, with this eliminate deleted records.
And modified database I can process with DC_WorkArea2Excel.

At this moment problem is solved, (only still do not know if it is possible direct use RemoveDiplicities in excel.)

Thanks for solution ideas,

Viktor