Skip to content

Commit a57340e

Browse files
authored
Rewrite of Regular Expression Search (#401)
* Rewrite of Regular Expression Search 1. Rephrases "General Syntax" 2. Adds `owner != ^[a-zA-Z]{3}$` returns empy and non-three-letter owners 3. Rephrases Searching for strings with double quotation marks (") 4. Adds Greedy quantifiers 5. Adds Reluctant quantifiers 6. Adds Possessive quantifiers * Fix spelling of empty * lint
1 parent d8d40de commit a57340e

File tree

1 file changed

+73
-20
lines changed
  • en/finding-sorting-and-cleaning-entries

1 file changed

+73
-20
lines changed

en/finding-sorting-and-cleaning-entries/search.md

+73-20
Original file line numberDiff line numberDiff line change
@@ -18,35 +18,48 @@ At the right of the search text field, 2 buttons allow for selecting some settin
1818
* Case sensitivity
1919
* Whether or not the search query is case sensitive.
2020

21-
## Simple search
21+
## Simple search <a href="#simple-search" id="simple-search"></a>
2222

2323
In a normal search, the program searches your library for all occurrences of the words in your search string, once you entered it. Only entries containing all words will be considered matches. To search for sequences of words, enclose the sequences in double-quotes. For instance, the query **progress "marine aquaculture"** will match entries containing both the word "progress" and the phrase "marine aquaculture".
2424

2525
All entries that do not match are hidden, leaving for display the matching entries only.
2626

2727
To stop displaying the search results, just clear the search field, press Esc or click on the "Clear" (`X`) button.
2828

29-
## Search using regular expressions <a href="#advanced" id="advanced"></a>
29+
## Search using regular expressions <a href="#regular-expressions" id="regular-expressions"></a>
3030

3131
{% hint style="warning" %}
3232
Make sure that the button "regular expressions" is activated
3333
{% endhint %}
3434

3535
### General syntax
3636

37-
In order to search specific fields only and/or include logical operators in the search expression, a special syntax is available in which these can be specified. E.g. to search for entries whose an author contains **miller**, enter:
37+
In order to only search for content within specific fields and/or to include logical operators in the search expression, a special syntax is available in which these can be specified. Both the field specification and the search term support [regular expressions](search.md#regular-expressions).
3838

39-
`author = miller`
39+
#### Search within specific Fields
4040

41-
Both the field specification and the search term support [regular expressions](search.md#regular-expressions). If the search term contains spaces, enclose it in quotes. Do _not_ use spaces in the field specification! E.g. to search for entries about image processing, type:
41+
To search for entries whose author contains **miller**, enter: `author = miller`. The `=` sign is actually a shorthand for `contains`. Searching for an exact match is possible using `matches` or `==`.
4242

43-
`title|keywords = "image processing"`
43+
#### Search for terms containing spaces
4444

45-
You can use `and`, `or`, `not`, and parentheses as intuitively expected:
45+
If the search term contains spaces, enclose it in quotes. Do _not_ use spaces in the field specification! E.g to search for entries with the title "image processing", type: `title = "image processing"`
4646

47-
`(author = miller or title|keywords = "image processing") and not author = brown`
47+
#### Search using parentheses, `and`, `or` and `not`
4848

49-
The `=` sign is actually a shorthand for `contains`. Searching for an exact match is possible using `matches` or `==`. Using `!=` tests if the search term is _not_ contained in the field (equivalent to `not ... contains ...`). The selection of field types to search (required, optional, all) is always overruled by the field specification in the search expression. If a field is not given, all fields are searched. For example, `video and year == 1932` will search for entries with any field containing `video` and the field `year` being exactly `1932`.
49+
To search for entries with the title _or_ the keyword "image processing", type: `title|keywords = "image processing"`. To search for entries _without_ the title or the keyword "image processing", type: `title|keywords != "image processing"` It is also possible to chain search expressions. In general, you can use `and`, `or`, `not`, and parentheses as intuitively expected:
50+
51+
`(author = miller or title|keywords = "image processing") and not author = brown and != author = blue`
52+
53+
Logical Operator / Symbol | Explanation
54+
|:---|:---|
55+
XY | X followed by Y
56+
X\|Y | Either X or Y
57+
(X) | X, as a capturing group
58+
| != | tests if the search term is _not_ contained in the field (equivalent to `not ... contains ...`)|
59+
60+
#### Regular Expression search and Field Types
61+
62+
The selection of field types to search (required, optional, all) is always overruled by the field specification in the search expression. If a field is not given, all fields are searched. For example, `video and year == 1932` will search for entries with any field containing `video` and the field `year` being exactly `1932`.
5063

5164
### Pseudo fields
5265

@@ -60,9 +73,9 @@ JabRef defines the following pseudo fields:
6073
| `key` | Search for citation keys | `citationkey == miller2005`: search for an entry whose citation key is **miller2005** |
6174
| `entrytype` | Search for entries of a certain type | `entrytype = thesis`: search entries whose type (as displayed in the `entrytype` column) contains the word **thesis** (which would be **phdthesis** and **mastersthesis**) |
6275

63-
### Advanced use of regular expressions
76+
### Advanced use of regular expressions <a href="#regular-expressions-advanced" id="regular-expressions-advanced"></a>
6477

65-
Regular expressions (regex for short) define a language for specifying the text to be matched, for example when searching. JabRef uses regular expressions as defined in Java. For extensive information, please, look at the [Java documentation](https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/regex/Pattern.html) and at the [Java tutorial](https://docs.oracle.com/javase/tutorial/essential/regex/).
78+
Regular expressions (RegEx for short) define a language for representing patterns matching text, for example when searching. There are different types of RegEx languages. JabRef uses regular expressions as defined in Java. For extensive advanced information about Java's RegEx patterns, please have a look at the [Java documentation](https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/regex/Pattern.html) and at the [Java tutorial](https://docs.oracle.com/javase/tutorial/essential/regex/).
6679

6780
#### Regular expressions and casing
6881

@@ -72,15 +85,22 @@ If casing is important to your search, activate the case-sensitive button.
7285

7386
#### Searching for entries with an empty or missing field
7487

75-
* `.` means any character
76-
* `+` means one or more times
88+
* `.` means: any character
89+
* `+` means: one or more times
7790

7891
`author != .+` returns entries with empty or no author field.
7992

93+
* `^` means: the beginning of a line
94+
* `[a-zA-Z]` means: a through z or A through Z, inclusive (range)
95+
* `$` means: the end of a line
96+
* `X{n}` means: X, exactly n times
97+
98+
`owner != ^[a-zA-Z]{3}$` returns empty and non-three-letter owners
99+
80100
#### Searching for a given word
81101

82-
* `\b` means word boundary
83-
* `\B` means not a word boundary
102+
* `\b` means: word boundary
103+
* `\B` means: not a word boundary
84104

85105
`keywords = \buv\b` matches _uv_ but not _lluvia_ (it does match _uv-b_ however)
86106

@@ -92,8 +112,8 @@ If casing is important to your search, activate the case-sensitive button.
92112

93113
#### Searching with optional spelling
94114

95-
* `?` means none or one copy of the preceding character.
96-
* `{n,m}` means at least _n_, but not more than _m_ copies of the preceding character.
115+
* `?` means: none or one copy of the preceding character.
116+
* `{n,m}` means: at least _n_, but not more than _m_ copies of the preceding character.
97117
* `[ ]` defines a character class
98118

99119
`title =neighbou?r` matches _neighbour_ and _neighbor_, and also _neighbours_ and _neighbors_, and _neighbouring_ and _neighboring_, etc.
@@ -122,6 +142,39 @@ It means that to search for a string including a backslash, two consecutive back
122142

123143
The character `"` has a special meaning: it is used to group words into phrases for exact matches. So, if you search for a string that includes a double quotation, the double quotation character has to be replaced with the hexadecimal character 22 in ASCII table `\x22`.
124144

125-
Hence, to search for `{\"o}quist` as an author, you must input `author = \{\\\x22o\}quist`, with regular expressions enabled (Note: the `{`, `_` and the `}` are escaped with a backslash; see above).
126-
127-
Indeed, `\"` does not work as an escape for `"`. Hence, neither `author = {\"o}quist` with regular expression disabled, nor `author = \{\\\"O\}quist` with regular expression enabled, will find anything even if the name `{\"o}quist` exists in the library.
145+
Neither a simple backslash `\"`, nor a double backslash `\\"` will work as an escape for `"`. Neither `author = {\"o}quist` with regular expression disabled, nor `author = \{\\\"O\}quist` with regular expression enabled, will find anything, even if the name `{\"o}quist` exists in the library.
146+
147+
Hence, to search for `{\"o}quist` as an author, you must input `author = \{\\\x22o\}quist`, with regular expressions enabled (Note: the `\`, `{`, `_` and the `}` are escaped with a backslash; see above).
148+
149+
#### Greedy quantifiers
150+
151+
Quantifier | Explanation
152+
|:---|:---|
153+
X? | X, once or not at all
154+
X* | X, zero or more times
155+
X+ | X, one or more times
156+
X{n} | X, exactly n times
157+
X{n,} | X, at least n times
158+
X{n,m}| X, at least n but not more than m times
159+
160+
#### Reluctant quantifiers
161+
162+
Quantifier | Explanation
163+
|:---|:---|
164+
X?? | X, once or not at all
165+
X*? | X, zero or more times
166+
X+? | X, one or more times
167+
X{n}? | X, exactly n times
168+
X{n,}? | X, at least n times
169+
X{n,m}? | X, at least n but not more than m times
170+
171+
#### Possessive quantifiers
172+
173+
Quantifier | Explanation
174+
|:---|:---|
175+
X?+ | X, once or not at all
176+
X*+ | X, zero or more times
177+
X++ | X, one or more times
178+
X{n}+ | X, exactly n times
179+
X{n,}+ | X, at least n times
180+
X{n,m}+ | X, at least n but not more than m times

0 commit comments

Comments
 (0)