I’m currently writing a CLI tool that handles a specific JSON data format. And I also want to give the user to get a slice of the item array of the file. It’s a slice in form of --slice START:END
through commandline options. So in example --slice 1:2
.
- Should I provide a 0 based index for the access or a 1 based index? In example
--slice 1:2
with 0 based index would start with the second element and with 1 based index it would start with the first element. - And would you think its better to have the
END
to be inclusive or exclusive? In example--slice 1:2
would get only one element if its exclusive or it gets two elements if its inclusive.
I know this is all personal taste, but I’m currently just torn between all options and cannot decide. And thought to ask you what you think. Maybe that helps me sorting my own thoughts a bit. Thanks in advance.
I’ve been working on this problem for my own language, and have landed on something more clear than just following a convention. Basically you use
[]
and()
to specify if the left and right bounds are included or not (based off of interval notation: https://en.wikipedia.org/wiki/Interval_(mathematics)#Including_or_excluding_endpoints). e.g. for your case--slice [1:5) # include the left index. don't include the right index --slice [1:5] # include both left and right index --slice (1:5] # don't include the left index. include the right index --slice (1:5) # don't include the left or right index
potentially not relevant to your case, but my version supports an
end
keyword which you can do math on, similar to python’s negative indexing[2:end-3] # start at index 2 (included) and go through till the third from last index (included) (end-3:end] # start at the third from last (excluded) and go to the end (included)
Personally I’m a fan of 0 indexing, but for your context, I think it would depend on how the user sees what they’re slicing. E.g. if it was pages with page numbers, the numbers would indicate if it was 0 or 1 index based. If there’s nothing to actually show the user, I think picking something reasonable and documenting it well is probably the best bet.
Anybody capable of using a CLI knows that the right answer is:
- index from 0
- end is exclusive.
Dijkstra points out why: https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html
I agree with that other comment which argues to set it as the users expect. I think the 1 based is logical here
But contrary to that, often ‘0’ is also used as the last element or points to “the entire match” in example. Whatever that is. I feel like outside of programming languages, for the end user, its not that clear of an answer. Why I created this topic.
I’ll read the linked article and rethink this topic. Maybe introducing another option to make the index 0 based (or the other way 1 based).
often ‘0’ is also used as the last element
Where? I’ve literally never heard of this convention.
RegExes. For instance, in JavaScript,
'foobar'.match(/(foo)(bar)/)
is['foobar', 'foo', 'bar']
Now that you ask, I don’t have any example of this. I know program
head
has negative numbers to access from the last element backwardsls -1 | head -n -1
, but it does not start by 0. So yeah, the 0 as last element might be not as common as I thought to be.-1 is common. I’ve at least seen it from python.
I know some programming languages use : for ranges and it is more legible if you support negative indices, but I think START-END is more natural reading and I’d use : for START:COUNT instead, e.g. 3:4 for 4 elements starting from 3, so elements 3,4,5,6 or 3-6.
You can even support both formats! (Feature creep warning)
A dash is a bit problematic from practical point of view. In example I allow single numbers without a colon like just
6
which would be interpreted as6:6
. And each element is optional as well, which would make-6
either be a negative number, an commandline option or a range? Some languages also use dots..
instead. If I want ever support negative numbers, then the hypen, dash or minus character would be in the way.I mean I could just do a duck typing like stuff, where I accept “any” non digit character (maybe except the minus and plus characters) with regex. Hell even a space could be used… But I think in general a standardized character is the better option for something like this. Because from practical point of view, there is no real benefit for the end user using a different character in my opinion. Initially I even thought about what format to use and a colon is pretty much set in stone for me.