Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with unsigned bigint numbers #12

Open
fredkilbourn opened this issue Jun 5, 2020 · 3 comments
Open

Doesn't work with unsigned bigint numbers #12

fredkilbourn opened this issue Jun 5, 2020 · 3 comments

Comments

@fredkilbourn
Copy link

const source = toRegexRange( "0", "18446744073709551615" );
const regex = new RegExp(`^${source}$`);
console.log( regex.test( "71847175378514" ) );
console.log( regex.test( "18446744073709551615" ) );

//outputs true, then false

@mr-bronson
Copy link

Without even looking beyond the first few lines of code, this obviously doesn't work right for anything outside of the range Number.MIN_SAFE_INTEGER to Number.MAX_SAFE_INTEGER, i.e. -(253-1) to (253-1).

It's a pretty sad and short-sighted design choice. The input starts as a string and results in a string. At no point should it have been turned into a number; that's just dumb. Your code just needs to be aware of how numbers work and manipulate strings according to basic number theory. At that point, your code's only limits are other things, like available memory.

It doesn't matter if your code has been "validated against more than 2.78 million test assertions" if you don't document the obvious limits outside of which it fails to work. (And don't just document the errors and silently provide erroneous answers; throw an error to say you don't know how to count that high!) An error is always better than a wrong answer.

It's just one of many design fails. For example, the function accepts input strings as hexadecimal, and even sometimes outputs answers in hexadecimal, e.g.
toRegexRange("0x2", "0x3") -> "(?:0x2|0x3)"
But inconsistently:
toRegexRange("0x2", "0x4") -> "[2-4]"

Not to mention something like this unholy mess:
toRegexRange("1e200", "2e200") ->
"(?:1[e-.][+-9][2-0]00|1.900000000000002e+200[0-9]|1.99[0-e][0-+][0-2]00[0-undefined][0-undefined][0-undefined][0-undefined][0-undefined][2-undefined][e-undefined][+-undefined][2-undefined][0-undefined][0-undefined][0-9]{3}|1.99999[e-9][+-9][2-0]00|1.9999999[0-e][0-+][0-2]00[0-undefined][0-undefined][3-undefined][e-undefined][+-undefined][2-undefined][0-undefined][0-undefined][0-9]|1.99999999[e-9][+-0][2-0]00|1.999999999[0-e][0-+][0-2]00[2-undefined][e-undefined][+-undefined][2-undefined][0-undefined][0-undefined][0-9]|1.9999999999[e-9][+-e][2-+][0-2]0|1.99999999999[e-9][+-0][2-0]0[0-2]|1.999999999999[0-e][2-+][e-2][+-0][2-0][0-undefined][0-undefined][0-9]{2}|1.99999999999999[e-9][+-e][2-+][0-2]0|[12][.-e][9-+][9-2][9-0][9-0][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][9-undefined][e-undefined][+-undefined][2-undefined][0-undefined][0-undefined])"

(On the bright side, that one doesn't even compile, so you're going to know it's wrong. It's the ones that happen to be silently wrong that are the bigger problem.)

Did you not know you were programming in JavaScript, which has implicit type conversions, no dedicated integer type and similar treacherous things? If you don't anticipate and catch errors yourself, it's going to silently give you garbage output.

Even in range, this isn't very efficient, e.g. toRegexRange("1e13", "2e13") could simply be "(?:1[0-9]{13}|20{13})", but instead it's:

// [space added to show structure]
"(?:1000000000000[0-9]
   |100000000000[1-9][0-9]
   |10000000000[1-9][0-9]{2}
   |1000000000[1-9][0-9]{3}
   |100000000[1-9][0-9]{4}
   |10000000[1-9][0-9]{5}
   |1000000[1-9][0-9]{6}
   |100000[1-9][0-9]{7}
   |10000[1-9][0-9]{8}
   |1000[1-9][0-9]{9}
   |100[1-9][0-9]{10}
   |10[1-9][0-9]{11}
   |1[1-9][0-9]{12}
   |20000000000000)"

@jonschlinkert
Copy link
Member

It's a pretty sad and short-sighted design choice.

Hey there, welcome to open source. This is where people collaborate to make code better for everyone. When I created this project it worked precisely for what I needed, and the use cases your mentioning would never be permitted in places where this was used. If you want to use it for something else, do a PR. If you don't, why are you so using so many words to say so little?

according to basic number theory

lol, "number theory"

If you don't document the obvious limits outside of which it fails to work.

Why would one document something obvious? Or were you trying to articulate something else. Please take your time.

Not to mention something like this unholy mess:

Garbage in, garbage out.

Even in range, this isn't very efficient, e.g. toRegexRange("1e13", "2e13") could simply be "(?:1[0-9]{13}|20{13})", but instead it's:

Where is your regex range code so I can see how it's supposed to be done? Or really any examples you can show me of code you've written so we can all learn.

Since you know how it should be done, it would be a shame if you kept that all to yourself.

@mr-bronson
Copy link

according to basic number theory

lol, "number theory"

Not sure why that's funny. What I mean is the basic rules about how regular numbers work. For example you have n different characters used as digits, which makes it base n, you have most significant digits on the left, each digit to the left of another digit is worth n times as much, etc.

If you don't document the obvious limits outside of which it fails to work.

Why would one document something obvious? Or were you trying to articulate something else. Please take your time.

Sure. Obvious to a programmer looking at the code. Not obvious to a user/consumer of the module. The essence of a good module is having a well-defined interface and rules about what the module does. Packaging it in a module, among other things, is meant to hide irrelevant implementation details, so that it can be used in a variety of contexts as long as the well-defined interface is respected by the module and the module users.

Garbage in, garbage out.

That's not a virtue. Within a module, of course you'll use functions that don't check inputs as much, because you can make assumptions about what code is calling certain functions. But the exposed functions of a module should be checking inputs. To some extent you understood this, because you made a function to check inputs and threw TypeError. It's just that your checking wasn't complete.

Where is your regex range code so I can see how it's supposed to be done? Or really any examples you can show me of code you've written so we can all learn.

Since you know how it should be done, it would be a shame if you kept that all to yourself.

You seem to be getting defensive or upset like I'd called your daughter ugly. It's just code, dude. I've written a lot of crappy code myself that did what I needed it to at the time. However, when you publish it as a module for others to use, then of course it will be exposed to scrutiny. And it should be, because your code is being used on websites that purport to provide working and correct regular expressions, and they don't.

I notice that the original report didn't get the slightest acknowledgment in over four years, but when I commented, you replied in minutes -- not to acknowledge reported defects, but to get defensive. Interesting. What do you think the purpose of issues are if not to report, acknowledge and work on defects? Like you said,

This is where people collaborate to make code better for everyone.

And so that should be the goal. If your only concern is "it worked precisely for what I needed", then that isn't what making it better for everyone looks like.

As to fixing this issue: Sure, if I get some extra time, I can code up what I had in mind to show you. But a simplistic fix for your existing code could at least:

  • disallow number inputs for which Number.isSafeInteger(n) returns false.
  • convert number inputs and string inputs to bigint, and do your arithmetic only with bigints.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants