Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change comment directive parsing #1149

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion History.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@
* Moved old DEVELOPERS file to CONTRIBUTING to match github conventions.
* TomDoc output now has a "Returns" heading. Issue #234 by Brian Henderson
* Metaprogrammed methods can now use the :args: directive in addition to the
:call-seq: directive. Issue #236 by Mike Moore.
\:call-seq: directive. Issue #236 by Mike Moore.
* Sections can be linked to using "@" like labels. If a section and a label
have the same name the section will be preferred. Issue #233 by Brian
Henderson.
Expand Down
2 changes: 1 addition & 1 deletion doc/rdoc/markup_reference.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1245,7 +1245,7 @@ def dummy_instance_method(foo, bar); end;
#
# Here is the <tt>:call-seq:</tt> directive given for the method:
#
# :call-seq:
# \:call-seq:
# call_seq_directive(foo, bar)
# Can be anything -> bar
# Also anything more -> baz or bat
Expand Down
198 changes: 190 additions & 8 deletions lib/rdoc/comment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,12 @@ def normalize
self
end

# Change normalized, when creating already normalized comment.

def normalized=(value)
@normalized = value
end

##
# Was this text normalized?

Expand Down Expand Up @@ -223,14 +229,190 @@ def tomdoc?
@format == 'tomdoc'
end

##
# Create a new parsed comment from a document
MULTILINE_DIRECTIVES = %w[call-seq].freeze # :nodoc:

def self.from_document(document) # :nodoc:
comment = RDoc::Comment.new('')
comment.document = document
comment.location = RDoc::TopLevel.new(document.file) if document.file
comment
end
# There are more, but already handled by RDoc::Parser::C
COLON_LESS_DIRECTIVES = %w[call-seq Document-method].freeze # :nodoc:

DIRECTIVE_OR_ESCAPED_DIRECTIV_REGEXP = /\A(?<colon>\\?:|:?)(?<directive>[\w-]+):(?<param>.*)/

private_constant :MULTILINE_DIRECTIVES, :COLON_LESS_DIRECTIVES, :DIRECTIVE_OR_ESCAPED_DIRECTIV_REGEXP

class << self

##
# Create a new parsed comment from a document

def from_document(document) # :nodoc:
comment = RDoc::Comment.new('')
comment.document = document
comment.location = RDoc::TopLevel.new(document.file) if document.file
comment
end

# Parse comment, collect directives as an attribute and return [normalized_comment_text, directives_hash]
# This method expands include and removes everything not needed in the document text, such as
# private section, directive line, comment characters `# /* * */` and indent spaces.
#
# RDoc comment consists of include, directive, multiline directive, private section and comment text.
#
# Include
# # :include: filename
#
# Directive
# # :directive-without-value:
# # :directive-with-value: value
#
# Multiline directive (only :call-seq:)
# # :multiline-directive:
# # value1
# # value2
#
# Private section
# #--
# # private comment
# #++

def parse(text, filename, line_no, type, &include_callback)
case type
when :ruby
text = text.gsub(/^#+/, '') if text.start_with?('#')
private_start_regexp = /^-{2,}$/
private_end_regexp = /^\+{2}$/
indent_regexp = /^\s*/
when :c
private_start_regexp = /^(\s*\*)?-{2,}$/
private_end_regexp = /^(\s*\*)?\+{2}$/
indent_regexp = /^\s*(\/\*+|\*)?\s*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we store these static regexp objects in constants instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already have an explicit name private_start_regexp and I don't see much benefit of making it a constant.

C_PRIVATE_START_REGEXP = /regexp/
private_constant :C_PRIVATE_START_REGEXP, :C_...

...
  when :c
    private_start_regexp = C_PRIVATE_START_REGEXP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why can't we directly refer to C_PRIVATE_START_REGEXP without defining locals for them 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these locals are a layer to absorbs differences between types

when :ruby
  private_start_regexp = /^-{2,}$/
when :c
  private_start_regexp = /^(\s*\*)?-{2,}$/
when :simple
  private_start_regexp = /^-{2}$/
end
...
line.match?(private_start_regexp)
...
value_lines = take_multiline_directive_value_lines(..., indent_regexp, ...)

I think it could be simplified to:

  1. Remove # or * depend on type first
  2. Use the same regexp for all types

But it will introduce incompatibility.

horizontal divider between two paragraphs in :simple
---

--
only two dashes starts private section in :simple
++

#-----
# Two or more dashes after # are private section start in :ruby
# Tested in RDocCommentTest#test_remove_private_long
#++

Removing spaces before * instead of replacing * to a space (needed to distinguish *--- from * ---) also brings changes in a c comment where * are not aligned. (I think this is a small problem though)

  /*
*    :call-seq:
   *   foo(bar)
    */

text = text.gsub(/\s*\*+\/\s*\z/, '')
when :simple
# Unlike other types, this implementation only looks for two dashes at
# the beginning of the line. Three or more dashes are considered to be
# a rule and ignored.
private_start_regexp = /^-{2}$/
private_end_regexp = /^\+{2}$/
indent_regexp = /^\s*/
end

directives = {}
lines = text.split("\n")
in_private = false
comment_lines = []
until lines.empty?
line = lines.shift
read_lines = 1
if in_private
# If `++` appears in a private section that starts with `--`, private section ends.
in_private = false if line.match?(private_end_regexp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to flip in_private here? Can we add a comment explaining it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private_start_regexp (matches to --) begins private section (in_private = true)
private_end_regexp (matchs to ++) ends private section (in_private = false)

comment added

line_no += read_lines
next
elsif line.match?(private_start_regexp)
# If `--` appears in a line, private section starts.
in_private = true
line_no += read_lines
next
end

prefix = line[indent_regexp]
prefix_indent = ' ' * prefix.size
line = line.byteslice(prefix.bytesize..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add param.strip! here so we don't need to call param.strip in multiple branches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍
changed to:

raw_param = directive_match[:param]
param = raw_param.strip

Unstripped raw_param is also used to reject :toto:: like in the old implementation in pre_process.rb

# skip something like ':toto::'

if (directive_match = DIRECTIVE_OR_ESCAPED_DIRECTIV_REGEXP.match(line))
colon = directive_match[:colon]
directive = directive_match[:directive]
raw_param = directive_match[:param]
param = raw_param.strip
else
colon = directive = raw_param = param = nil
end

if !directive
comment_lines << prefix_indent + line
elsif colon == '\\:'
# If directive is escaped, unescape it
comment_lines << prefix_indent + line.sub('\\:', ':')
elsif raw_param.start_with?(':') || (colon.empty? && !COLON_LESS_DIRECTIVES.include?(directive))
# Something like `:toto::` is not a directive
# Only few directives allows to start without a colon
comment_lines << prefix_indent + line
elsif directive == 'include'
filename_to_include = param
include_callback.call(filename_to_include, prefix_indent).lines.each { |l| comment_lines << l.chomp }
elsif MULTILINE_DIRECTIVES.include?(directive)
value_lines = take_multiline_directive_value_lines(directive, filename, line_no, lines, prefix_indent.size, indent_regexp, !param.empty?)
read_lines += value_lines.size
lines.shift(value_lines.size)
unless param.empty?
# Accept `:call-seq: first-line\n second-line` for now
value_lines.unshift(param)
end
value = value_lines.join("\n")
directives[directive] = [value.empty? ? nil : value, line_no]
else
directives[directive] = [param.empty? ? nil : param, line_no]
end
line_no += read_lines
end

normalized_comment = String.new(encoding: text.encoding) << normalize_comment_lines(comment_lines).join("\n")
[normalized_comment, directives]
end

# Remove preceding indent spaces and blank lines from the comment lines

private def normalize_comment_lines(lines)
blank_line_regexp = /\A\s*\z/
lines = lines.dup
lines.shift while lines.first&.match?(blank_line_regexp)
lines.pop while lines.last&.match?(blank_line_regexp)

min_spaces = lines.map do |l|
l.match(/\A *(?=\S)/)&.end(0)
end.compact.min
if min_spaces && min_spaces > 0
lines.map { |l| l[min_spaces..] || '' }
else
lines
end
end

# Take value lines of multiline directive

private def take_multiline_directive_value_lines(directive, filename, line_no, lines, base_indent_size, indent_regexp, has_param)
return [] if lines.empty?

first_indent_size = lines.first.match(indent_regexp).end(0)

# Blank line or unindented line is not part of multiline-directive value
return [] if first_indent_size <= base_indent_size

if has_param
# :multiline-directive: line1
# line2
# line3
#
value_lines = lines.take_while do |l|
l.rstrip.match(indent_regexp).end(0) > base_indent_size
end
min_indent = value_lines.map { |l| l.match(indent_regexp).end(0) }.min
value_lines.map { |l| l[min_indent..] }
else
# Take indented lines accepting blank lines between them
value_lines = lines.take_while do |l|
l = l.rstrip
indent = l[indent_regexp]
if indent == l || indent.size >= first_indent_size
true
end
end
value_lines.map! { |l| (l[first_indent_size..] || '').chomp }

if value_lines.size != lines.size && !value_lines.last.empty?
warn "#{filename}:#{line_no} Multiline directive :#{directive}: should end with a blank line."
end
value_lines.pop while value_lines.last&.empty?
value_lines
end
end
end
end
45 changes: 35 additions & 10 deletions lib/rdoc/markup/pre_process.rb
Original file line number Diff line number Diff line change
Expand Up @@ -97,18 +97,15 @@ def initialize(input_file_name, include_path)
# RDoc::CodeObject#metadata for details.

def handle text, code_object = nil, &block
first_line = 1
if RDoc::Comment === text then
comment = text
text = text.text
first_line = comment.line || 1
end

# regexp helper (square brackets for optional)
# $1 $2 $3 $4 $5
# [prefix][\]:directive:[spaces][param]newline
text = text.lines.map.with_index(first_line) do |line, num|
next line unless line =~ /\A([ \t]*(?:#|\/?\*)?[ \t]*)(\\?):([\w-]+):([ \t]*)(.+)?(\r?\n|$)/
text = text.gsub(/^([ \t]*(?:#|\/?\*)?[ \t]*)(\\?):([\w-]+):([ \t]*)(.+)?(\r?\n|$)/) do
# skip something like ':toto::'
next $& if $4.empty? and $5 and $5[0, 1] == ':'

Expand All @@ -122,21 +119,49 @@ def handle text, code_object = nil, &block
comment.format = $5.downcase
next "#{$1.strip}\n"
end

handle_directive $1, $3, $5, code_object, text.encoding, num, &block
end.join
handle_directive $1, $3, $5, code_object, text.encoding, &block
end

if comment then
comment.text = text
else
comment = text
end

run_post_processes(comment, code_object)

text
end

# Apply directives to a code object

def run_pre_processes(comment_text, code_object, start_line_no, type)
comment_text, directives = parse_comment(comment_text, start_line_no, type)
directives.each do |directive, (param, line_no)|
handle_directive('', directive, param, code_object)
end
if code_object.is_a?(RDoc::AnyMethod) && (call_seq, = directives['call-seq']) && call_seq
code_object.call_seq = call_seq.lines.map(&:chomp).reject(&:empty?).join("\n") if call_seq
end
format, = directives['markup']
[comment_text, format]
end


# Perform post preocesses to a code object

def run_post_processes(comment, code_object)
self.class.post_processors.each do |handler|
handler.call comment, code_object
end
end

text
# Parse comment and return [normalized_comment_text, directives_hash]

def parse_comment(text, line_no, type)
RDoc::Comment.parse(text, @input_file_name, line_no, type) do |filename, prefix_indent|
include_file(filename, prefix_indent, text.encoding)
end
end

##
Expand All @@ -151,7 +176,7 @@ def handle text, code_object = nil, &block
# When 1.8.7 support is ditched prefix can be defaulted to ''

def handle_directive prefix, directive, param, code_object = nil,
encoding = nil, line = nil
encoding = nil
blankline = "#{prefix.strip}\n"
directive = directive.downcase

Expand Down Expand Up @@ -244,7 +269,7 @@ def handle_directive prefix, directive, param, code_object = nil,

blankline
else
result = yield directive, param, line if block_given?
result = yield directive, param if block_given?

case result
when nil then
Expand Down
Loading
Loading