-
Notifications
You must be signed in to change notification settings - Fork 25
Add language-specific node filtering #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This code also adds a bunch of node names to Sexp::NODE_NAMES for JS and PHP. It is probably not exhaustive.
* Fixed bug setting line number for hash literals to line of opening brace. Which changes line numbers in the tests here.
By observation, either an external parser is going to finish quickly or it probably isn't going to finish at all (at least within the total run timeout thresholds). Currently the timeout is 300s (5 minutes) out of a total run timeout of 900s (15m). That means 3 large files will kill off an entire project run. I've found that even rather javascript large files all parse within 2 seconds and those that don't don't parse within 5 minutes. Dropping the thresholds to 10s seems safe and will allow more projects to finish analyzing to report SOMETHING. Besides, all the files I've seen time out were vendored versions of react/ember and the like. They're huge and they're NOT actually part of the project. Losing them quickly will not reduce the quality of analysis.
I should probably make these 1 column and put it at the end of the file for maintainability.
So now a .codeclimate.yml file can have an entry like: - ruby: filters: - "(defn, [m /^test_/], _, ___)" and that will filter out all methods whose names start with "test_". This requires releases of flay and sexp_processor to pass tests. Pushing now to get eyeballs on.
AFAIK, I have NO projects to test against from the testing group. So I don't know if that value is too small. I suspect it'll be on par with my data from javascript and PHP tho (ie, if it can't parse in 10s, it can't parse in 5m).
Includes a failing spec from: #188 made to pass by filtering "(hash ___)" (for now?).
As extra datapoints:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
Is the default parsing timeout value of 300
useful anymore? Would it make sense to change that default to 10
and override it to 30
within the ruby parser?
Would it make sense to define some default filters when none are defined with the codeclimate yml configuration?
@@ -197,5 +235,18 @@ def self.from_remediation_amount(amount) | |||
def engine_conf | |||
EngineConfig.new({}) | |||
end | |||
|
|||
def filtered_engine_conf *patterns | |||
EngineConfig.new({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this indentation intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional, yes. Good? probably not. It's how emacs does it by default (more like lisp indentation).
expect([locations[0].begin_line, locations[0].end_line]).to eq([2, 7]) | ||
expect([locations[1].begin_line, locations[1].end_line]).to eq([11, 16]) | ||
expect([locations[0].begin_line, locations[0].end_line]).to eq([1, 7]) | ||
expect([locations[1].begin_line, locations[1].end_line]).to eq([10, 16]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what caused these line locations to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fixes to ruby_parser. I have a release coming out soon that makes more of these fixes so there might be more failed tests coming up.
def filters_for(language) | ||
fetch_language(language).fetch("filters", []).map { |s| | ||
Sexp::Matcher.parse s | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the s
variable represent? Does it stand for string? What do you think of fleshing it out to "filter" if we're mapping over filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I suspect it probably never made sense. 30 or 10 are more more sensible defaults based on the numbers I've gathered. I'd be fine dropping the numbers... but it might be better if data were gathered so that statistics could decide what the numbers should be.
Absolutely. But I suspect at this time it would be premature to decide what those should be. @ABaldwinHunter and others would have more experienced/valid opinions on this matter than I would. |
This is available on the beta channel now: engines:
duplication:
enabled: true
channel: beta
config:
languages:
- ruby |
@gettinToasty Would you mind sharing your |
@dblandin I managed to fix that issue-- the new readme says the format should be
but it should be:
i am having some trouble with the pattern matching though. my main problem is propTypes being marked as duplicates so i tried |
@zenspider should be able to help with the pattern syntax ^ @gettinToasty Could you provide an example case of detected duplication you're trying to filter out? |
@dblandin sure thing! here are some lightly modified code snippets:
which have a Mass 90 duplication issue |
Hrm... wrt the config, here is what I'm using:
(intentionally nuking EVERYTHING, to see full coverage and its corresponding performance differences) |
OK. WRT the code... this is a doozy. I am not fond of the AST that comes back from javascript, but after our munging, this is how it looks: s(:Program,
:module,
s(:body,
s(:ExpressionStatement,
:expression,
s(:AssignmentExpression,
:"=",
:left,
s(:MemberExpression,
:object,
s(:Identifier, :EventBlock),
:property,
s(:Identifier, :propTypes)),
s(:right,
:ObjectExpression,
s(:properties,
s(:ObjectProperty,
:key,
s(:Identifier, :heading),
:value,
s(:MemberExpression,
:object,
s(:MemberExpression,
:object,
s(:Identifier, :PropTypes),
:property,
s(:Identifier, :string)),
:property,
s(:Identifier, :isRequired))),
... so you probably want something more like:
OR:
|
@zenspider would something like |
I'd also really like some feedback on how you (or anyone) thinks that you should be able to see the AST to figure that out. Right now I have a patch on CCD that I run inside docker that dumps it. It is incredibly verbose for javascript (as hinted, the AST is ginormous). I could probably have this set up to output to the debug log if |
@zenspider as someone completely new to this syntax there's no way i would have been able to get anywhere close without seeing the AST haha, but that first expression you gave seems to do the trick really well |
Yeah... The JS parser in particular is a little crazy... I want to expose SOMETHING to make it easier, just not sure what it should be yet... Maybe the filtering should just be against the structure of the tree and not the tree itself? @gordondiggs ? |
@zenspider can you explain the different options there a bit? |
@gordondiggs I've been mulling over this for a week and I think it'd be a mistake... So, the difference... given some code: def x(y)
y + 1
end The raw parse tree looks like: s(:defn, :x, s(:args, :y), s(:call, s(:lvar, :y), :+, s(:lit, 1))) and the structural tree looks like: s(:defn, s(:args), s(:call, s(:lvar), s(:lit))) which is easier to read and perhaps easier to write filters against, but it also means you can't filter on, for example, method names. I can totally imagine people filtering on something like: |
I found a tool: http://astexplorer.net It lets you visualize and explore the AST in a few languages, with the main focus being javascript. It might help (although right now we're massaging our AST for performance reasons which might make it a bit trickier). |
Just pushed PR #193 that allows you to dump the AST to better help with writing filters. |
I'm pretty sure that this has been merged upstream, so I'm going to go ahead and close this PR. Let me know if that's not the case! |
@dblandin i believe this functionality is only on the beta branch and not the main branch; i put up a new PR trying to use filters first with the beta branch which had what looked like some engine changes as well (new issues discovered) but when i removed |
@gettinToasty I know this ticket is old but we're trying to filter out |
99% of the work wound up in sexp_processor, which got released to beta yesterday. This has minor patches against Flay that will get pushed upstream and released.
This also has changes to parsing strategy. In my measurements, if a (especially foreign language) file can't parse in 10 seconds, it can't parse in 5 minutes (the default timeout for CommandLineRunner), basically chewing up 1/3rd of the overall default timeout for no benefit. By changing them to 10s, I timeout on only the most egregious files but now finish w/o a timeout error on all the projects.
The only thing I find really icky is the
new_nodes
array in lib/ccflay.rb. This is needed for the stable hash algorithm. It is probably missing node types and it would be nice if there was a comprehensive list, but the foreign parsers are messy and relatively undocumented as such. (I'd highly recommend adding a rewrite phase to cleane and speed things up even further).