You are on page 1of 3

search

How to add unique items to array?


 Moderated

jzakiya Apr 17

I want to build an array of unique numbers that are being generated by a process.

In Ruby it would look something like this:

arry << num unless arry.include? num

Is the following the fastest Nim way.

if not (num in arry): arry.add(num)

    Reply

jzakiya Apr 17

Or in Ruby, I can just keep adding numbers to the array and then return a unique array at the end.

a = [1,3,4,5,6,7,4,3,9,10,11,12,5]

a.uniq! => a = [1,3,4,5,6,7,9,10,11,12]

    Reply

amalek Apr 17

Nim
if num notin arry: arry.add(num)

Please note that you should use notin instead of negating the in operator. This is the preferred way. It should also be the
fastest. The only thing that the in operator does is call the contains proc on the given array, so there should be no more
overhead than necessary ﴾notin is simply a template which evaluates to something similar to what you wrote, but since
templates are evaluated at compile‐time they have no run‐time cost﴿.

That said, if you want something like Ruby's uniq, have a look at deduplicate. You can find it here, with lost of other
Ruby‐like procs for functional programming.

  Reply

Stefan_Salewski Apr 17

amalek, I think "not ﴾a in b﴿" is fine and the same as "a notin b". So notin keyword just saves the bracket.

And all in/contains operations on arrays should be O﴾N﴿ of course, so will be not that fast for large arrays. So we may
prefer to use sets or hashsets for data which should be unique. There was a thread about related topic once:

https://forum.nim‐lang.org/t/2611#16168

  Reply

jzakiya Apr 17

Hey thanks @amalek . I'll test and see what works best for my case.
    Reply

miran Apr 17

an array of unique numbers

use sets / HashSets?

  Reply

jzakiya Apr 17

So @miran | @Stefan_Salewski using sets is faster than deduplicating an array in Nim?


Also, I ultimately want to sort the array. Will sets give me this automatically, or do I need to manually do this?

    Reply

Stefan_Salewski Apr 17

jzakiya, when you have to sort, you may need a seq/array indeed. I think sorted sets only preserve insertion order. O﴾N﴿
may be fast if N is small of course. See the reply of cblake to my above mentioned thread ‐‐ I think I used that approach
once.
  Reply

amalek Apr 17

Also, I ultimately want to sort the array. Will sets give me this automatically, or do I need to manually do this?

You need a seq for that. Sets aren't guaranteed to be ordered ﴾even though they seem to remember insertion oreder, in
practice, so if you want to rely on implementation‐specific behaviour you could do it﴿. Also, sets only accept a limited
number of types. See here.

  Reply

jzakiya Apr 17

Hey @Stefan , I did an import macros, srtutils but I get a undeclared identifier: 'deduplicate' compiler error.
    Reply

SolitudeSF Apr 17

@jzakiya , its in sequtils

  Reply

Araq  Apr 17

https://nim‐lang.org/docs/theindex.html CTRL+F: "deduplicate" ‐‐> https://nim‐


lang.org/docs/sequtils.html#deduplicate,openArray%5BT%5D

Yes I know jzakiya is immune to Nim's index/documentation, but maybe others benefit from my occasional hints.

  Reply

jzakiya Apr 17

Thanks @SolitudeSF .

@Araq , when you're coming from one language to another ﴾software or human﴿ what is obvious to one fluent in the
language is not to someone trying to learn the language. As has been brought up many, many times before, Nim's
documentation needs to seriously improve, to make it more accessible to more people, who are coming from many
different language paradigms.

Also, it is better to attract people with honey than vinegar.

    Reply

Araq  Apr 17

Another red herring. I never got an answer from you what is wrong with using the index to find things. ﴾In this case where
to find the symbol deduplicate .﴿
  Reply

jzakiya Apr 18

So @miran and @Stefan_Salewski I used a set to replace a seq , and it is pretty fast for this case, as I don't have to
worry about deduplicating the array, which was verrrry slow as the array size increased. Sets did create a much better
logical inplace replacement for this case. The structural problem with sets is, it only takes up to uint16 size ﴾2^16‐1 =
65,535﴿ elements, and my sets could take upto ~250K elements, so when I do card(set) ﴾in place of arry.len ﴿ I start
getting wrong answers once the input size passes a certain point. Is there a way to increase the size, or work around this?
    Reply

dataman Apr 18

@jzakiya

Is there a way to increase the size, or work around this?

Try IntSet.

  Reply

jzakiya Apr 18

Hey @dataman can you show how to use this to replace using sets .
Below is a code prototype using sets , what would be the equivalent with IntSet .

proc example(a, b, c) =
....
var aset: set[uint16]
....
while n < max
....
aset = {}
for b in x..y:
.....
incl(aset, k.uint16)
.....
count += card(aset))
....

    Reply

dataman Apr 18

import intsets var s = initIntSet() s.incl(1) s.incl(10) echo s echo s.card

  Reply

jzakiya Apr 18

Hey @dataman , thanks for info on intsets

FYI, intsets has a problem compiling the card|len method on 0.17.2 but not on 0.18.0 . Also, while intsets indeed
will take the correct number of set members, it's at least 3x slower than using sets , across the board. So for this case,
using a seq array is the better alternative.

Well, at least I learned some new things today. :﴿

    Reply

ErikCampobadal May 15

use a sequence:

Nim
var numbers = seq[int]
numbers.add(1)
numbers.add(2)
# ...

  Reply

2 MONTHS SINCE LAST REPLY

 Reply

You might also like