Excel remove duplicates

This forum is for eXpress++ general support.
Message
Author
Victorio
Posts: 621
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: Excel remove duplicates

#11 Post by Victorio »

Ok, understand, thanks.

Victorio
Posts: 621
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: Excel remove duplicates

#12 Post by Victorio »

My plan to sort records in DBF via INDEX UNIQUE failed because my index key is longer than 240 characters,
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number

For example i tryed this :

x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)

or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)

still get error Invalid index

Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?

Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo

skiman
Posts: 1185
Joined: Thu Jan 28, 2010 1:22 am
Location: Sijsele, Belgium
Contact:

Re: Excel remove duplicates

#13 Post by skiman »

Hi Victorio,

I was wondering what you want to do? It can't be that problem with Xbase++.

There are 34 colums/fields which you want to test if they are duplicated? Do you want to check if all fields are the same, or if ONE of these fields is the same?

An indexkey can't be that long, but you could use an array, depending on the size of the file. If you create an array with { recno() , field1+field2+... } then you can search for duplicates in the array. When you find a duplicate you have the recno() to process as you want.

Is you data in a dbf at start or is it an xls you want to process? Maybe post a sample of your data?
Best regards,

Chris.
www.aboservice.be

Victorio
Posts: 621
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: Excel remove duplicates

#14 Post by Victorio »

Hi Chris,
At the beginning I have a DBF (Visual Foxpro structure) and later I generate XLSX from it
I originally wanted to solve it through Excel, but I don't know because of the problem.
I need to delete records that have the same 35 out of 36 fields.
I wanted to avoid working with arrays because I don't know how fast it will be with large databases.
But yesterday I tried to program a different way using DBF using Var2Bin() , it looks like it should work, but I'm just going to test it:

Code: Select all

SELECT 1
use d:\#POKUSY\TEST_Excel\POKUS2 EXCLUSIVE
go top
do while !eof()
	replace unik with .T.	// 
	skip
enddo

go top
prvy:=1
duplicit:=0
lastrec:=""
for i=1 to 36
	lastrec:=lastrec+Var2Bin(FieldGet(i))	
next

do while !eof()
	aktrec:=""
	for i=1 to 36
		aktrec:=aktrec+Var2Bin(FieldGet(i))
	next
	if aktrec==lastrec .and. prvy!=1
		replace unik with .F.	
		duplicit++
	endif
	if prvy=1
		prvy=0
	endif
	lastrec:=aktrec
	skip
enddo
close all

So, this is only first try without code optimization

User avatar
SlavkoDam
Posts: 86
Joined: Wed Apr 27, 2022 10:12 am
Location: Negotin, Serbia
Contact:

Re: Excel remove duplicates

#15 Post by SlavkoDam »

Hi Viktor,

Since you want to compare 35 out of 36 fields for equality, my Excel code can be modified not to compare each column values as it is now, but to compare values of entire rows, or values of a range of columns. So, there will be only one compare statement per each row, and no need to sum values of columns in the range, as in your test program. I believe you test program will work too.

Slavko
Best regards,

Slavoljub Damnjanovic
SD-SoftDesign, Alaska Software Technology Partner
https://www.sd-softdesign.com
https://www.sd-softdesign.rs

k-insis
Posts: 100
Joined: Fri Jan 28, 2011 4:07 am

Re: Excel remove duplicates

#16 Post by k-insis »

How many chars?!

As it is only for unique, you should change method how you determine uniqueness of some record data.

Put everything into expression into function and calculate SHA (hash value) of fields you want to account for unique.
Hash functions are extremely fast in any but musem grade computers.

FUNCTION MySomethingUq()
LOCAL cSomething := ""
LOCAL cHash := ""

//concat long string
cSomething := SOMEFIELD+SOMEFIELD2+DTOS(SOMEDATE)+STRZERO(SOMENUM,15,4)
cHash := Char2Hash( cSomething, 256 ) //you might use 128 too as this is only for comparison not security use

RETURN cHash

DBUseArea(.t.,,"somefile","hashwoes",.t.,.t.)
hashwoes->(dbcreateindex("someuniq","MySomethingUq()",{|| hashwoes->(MySomethingUq()) }, .T.))



Links:

https://doc.alaska-software.com/content ... 2hash.html


Victorio wrote: Tue Feb 27, 2024 8:09 am My plan to sort records in DBF via INDEX UNIQUE failed because my index key is longer than 240 characters,
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number

For example i tryed this :

x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)

or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)

still get error Invalid index

Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?

Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo

Victorio
Posts: 621
Joined: Sun Jan 18, 2015 11:43 am
Location: Slovakia

Re: Excel remove duplicates

#17 Post by Victorio »

Yes, i think about use HASH, but now works fine also with Var2Bin without problems.
First I sort DBF to other temporary DBF, then in cycle delete sign duplicates, then delete it,
and final i append to new database.
When I use PACK, program often crashed, because I use append, with this eliminate deleted records.
And modified database I can process with DC_WorkArea2Excel.

At this moment problem is solved, (only still do not know if it is possible direct use RemoveDiplicities in excel.)

Thanks for solution ideas,

Viktor

Post Reply