Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a &key option to the order command #1570

Closed
hanche opened this issue Jul 3, 2022 · 6 comments
Closed

Add a &key option to the order command #1570

hanche opened this issue Jul 3, 2022 · 6 comments

Comments

@hanche
Copy link
Contributor

hanche commented Jul 3, 2022

The order command currently has the option &less-than, which should be a function to be called with two values from the list to be ordered in order to determine is the first is to be placed before the second.

In applications, the supplied function often needs to extract or compute a sort key from each of the two list item, and then compare the keys. This can lead to extra verbosity, code that is harder to read. and a (somewhat) higher risk of bugs.

Proposal: Add an option &key=$nil to the order command. Its value should be a function of one argument, to be applied to each item in the list. Then the less-than comparison is applied to the resulting keys instead of the original values.

The proposal is inspired by the Common Lisp sort functiuon, which has such an option. It was recently discussed briefly in chat.

For reference, here is a simple elvish implementation of the proposal.

fn order {|&reverse=$false &less-than=$nil &key=$nil @args|
  use builtin
  set key = (coalesce $key $put~)
  set less-than = (coalesce $less-than {|a b| == -1 (compare $a $b)})
  builtin:order $@args &reverse=$reverse ^
    &less-than={|a b| $less-than ($key $a) ($key $b)}
}

Example usage:

⬥ var t = [[c 1] [a 3] [b 2]]
⬥ order $t &key={|x| put $x[1] }
⮕ [c 1]
⮕ [b 2]
⮕ [a 3]
@krader1961
Copy link
Contributor

Other prior art is the Python sorted command: https://docs.python.org/3/howto/sorting.html. However, note this statement in that documentation:

The value of the key parameter should be a function (or other callable) that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.

If the same optimization is done in Elvish then that clearly makes a &key option worthwhile, from a performance perspective, compared to an equivalent &less-than function. If that optimization is implemented then my opinion of adding a &key option goes from slightly opposed to greatly in favor of doing so.

P.S., I can't help proposing that order be renamed sorted at the same time. 😸

@hanche
Copy link
Contributor Author

hanche commented Jul 4, 2022

@krader1961 Yeah, I thought of that optimization too, but neglected to mention it, as I was in a rush to leave. Of course, it comes at a slight cost in space in order to save time, but that is probably not a strong argument against it. In any case,, that is easily resolved by writing a bit more go code, so the optimization is not applied when $key is $nil.

Below is an elvish implementation using this optimization. Incidentally, while testing it, I discovered that the documentation for order does not match the actual behaviour: If you explicitly give it the option &less-than=$nil, it raises an error.

fn order {|&reverse=$false &less-than=$nil &key=$nil @args|
  use builtin
  if $key {
    set less-than = (coalesce $less-than {|a b| == -1 (compare $a $b)})
    if (eq $args []) { all } else { put $@args } ^
    | each {|item| put [($key $item) $item]} ^
    | builtin:order &reverse=$reverse ^
      &less-than={|a b| $less-than $a[0] $b[0]} ^
    | each {|pair| put $pair[1]}
  } elif $less-than {
    builtin:order &reverse=$reverse &less-than=$less-than $@args
  } else {
    builtin:order &reverse=$reverse $@args
  }
}

@krader1961
Copy link
Contributor

krader1961 commented Jul 21, 2022

Incidentally, while testing it, I discovered that the documentation for order does not match the actual behaviour: If you explicitly give it the option &less-than=$nil, it raises an error.

This is because the value assignable to &less-than is strongly typed:

type orderOptions struct {
    Reverse  bool
    LessThan Callable
}

That could be changed to allow any value and add a test for $nil or a Callable to the body of the order function but it's not obvious that's a good trade-off to support something that would rarely (never?) be used.

On the other hand, the option handling framework could be modified to allow $nil for a Callable option since the "zero value" for a Callable var is nil. There are only two other functions that have a Callable option (eval and time). All three functions use the same pattern of testing if the option is equal to nil. And, if you think about it, that's really the only sensible way to do it so it makes sense for the option parsing code to allow $nil for a Callable option. I'll create a P.R. that does that.

@krader1961
Copy link
Contributor

krader1961 commented Jul 22, 2022

It turned out to be simpler than I expected to support order &less-than=$nil. It did take me a couple of hours to figure out due to unfamiliarity with the Go reflect package but the change itself required just five new lines (plus eleven more for unit tests). I should also have an order &key option added in the next day or two. I'm hoping that benchmarks show it is significantly faster than using &less-than.

@krader1961
Copy link
Contributor

krader1961 commented Jul 24, 2022

My earlier benchmark results made the &less-than option performance appear better than it really is due to a bug in my benchmark. Below are the correct results. Note that, as expected, as the number of items sorted increases the "simple" (no sort options) and "key" variants scale linearly with the input while the "less-than" variant is greater than linear in its run time. For large inputs (e.g., 10^5 items) using &key is almost 50x faster than &less-than. But for even just 10 items &key is 4x faster. I'm actually surprised &less-than isn't worse.

Benchmarking 10^1 (10) strings:
simple       0.001740
key          0.001969
less-than    0.007779

Benchmarking 10^2 (100) strings:
simple       0.000527
key          0.007207
less-than    0.103583

Benchmarking 10^3 (1000) strings:
simple       0.002862
key          0.053202
less-than    1.454200

Benchmarking 10^4 (10000) strings:
simple       0.032566
key          0.569946
less-than   21.170543

Benchmarking 10^5 (100000) strings:
simple       0.341759
key          5.622884
less-than  260.840497
use file
use math
use path
use str

range 1 6 | each {|p|
    var n = (math:pow 10 $p)
    echo
    echo "Benchmarking 10^"$p" ("$n") strings:"
    var @l = (range $n | each {|x| put (randint (/ $n 2))" line "$x })

    time &on-end={|d| printf "%-10s %10.6f\n" simple $d } {
        order $l | to-lines > /tmp/simple
    }
    time &on-end={|d| printf "%-10s %10.6f\n" key $d } {
        order $l &key={|v| num (str:split " " $v &max=2 | take 1) } | to-lines > /tmp/key
    }
    time &on-end={|d| printf "%-10s %10.6f\n" less-than $d } {
        order &less-than={|a b|
            var a-key = (num (str:split " " $a &max=2 | take 1))
            var b-key = (num (str:split " " $b &max=2 | take 1))
            < $a-key $b-key
            } $l | to-lines > /tmp/less-than
    }
}

@krader1961
Copy link
Contributor

I was waiting for my earlier change to support comparing boolean values to be merged before creating a pull-request for this feature. Now that comparing booleans is possible (see issue #1585) I'll proceed with creating a pull-request to resolve this issue.

krader1961 added a commit to krader1961/elvish that referenced this issue Sep 28, 2022
krader1961 added a commit to krader1961/elvish that referenced this issue Sep 28, 2022
krader1961 added a commit to krader1961/elvish that referenced this issue Sep 28, 2022
@xiaq xiaq closed this as completed in 306e2d7 Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants