pub struct Captures<'h> { /* private fields */ }
Expand description
Represents the capture groups for a single match.
Capture groups refer to parts of a regex enclosed in parentheses. They can be optionally named. The purpose of capture groups is to be able to reference different parts of a match based on the original pattern. For example, say you want to match the individual letters in a 5-letter word:
(?<first>\w)(\w)(?:\w)\w(?<last>\w)
This regex has 4 capture groups:
- The group at index
0
corresponds to the overall match. It is always present in every match and never has a name. - The group at index
1
with namefirst
corresponding to the first letter. - The group at index
2
with no name corresponding to the second letter. - The group at index
3
with namelast
corresponding to the fifth and last letter.
Notice that (?:\w)
was not listed above as a capture group despite it
being enclosed in parentheses. That’s because (?:pattern)
is a special
syntax that permits grouping but without capturing. The reason for not
treating it as a capture is that tracking and reporting capture groups
requires additional state that may lead to slower searches. So using as few
capture groups as possible can help performance. (Although the difference
in performance of a couple of capture groups is likely immaterial.)
Values with this type are created by Regex::captures
or
Regex::captures_iter
.
'h
is the lifetime of the haystack that these captures were matched from.
Example
use regex::bytes::Regex;
let re = Regex::new(r"(?<first>\w)(\w)(?:\w)\w(?<last>\w)").unwrap();
let caps = re.captures(b"toady").unwrap();
assert_eq!(b"toady", &caps[0]);
assert_eq!(b"t", &caps["first"]);
assert_eq!(b"o", &caps[2]);
assert_eq!(b"y", &caps["last"]);
Implementations§
source§impl<'h> Captures<'h>
impl<'h> Captures<'h>
sourcepub fn get(&self, i: usize) -> Option<Match<'h>>
pub fn get(&self, i: usize) -> Option<Match<'h>>
Returns the Match
associated with the capture group at index i
. If
i
does not correspond to a capture group, or if the capture group did
not participate in the match, then None
is returned.
When i == 0
, this is guaranteed to return a non-None
value.
Examples
Get the substring that matched with a default of an empty string if the group didn’t participate in the match:
use regex::bytes::Regex;
let re = Regex::new(r"[a-z]+(?:([0-9]+)|([A-Z]+))").unwrap();
let caps = re.captures(b"abc123").unwrap();
let substr1 = caps.get(1).map_or(&b""[..], |m| m.as_bytes());
let substr2 = caps.get(2).map_or(&b""[..], |m| m.as_bytes());
assert_eq!(substr1, b"123");
assert_eq!(substr2, b"");
sourcepub fn name(&self, name: &str) -> Option<Match<'h>>
pub fn name(&self, name: &str) -> Option<Match<'h>>
Returns the Match
associated with the capture group named name
. If
name
isn’t a valid capture group or it refers to a group that didn’t
match, then None
is returned.
Note that unlike caps["name"]
, this returns a Match
whose lifetime
matches the lifetime of the haystack in this Captures
value.
Conversely, the substring returned by caps["name"]
has a lifetime
of the Captures
value, which is likely shorter than the lifetime of
the haystack. In some cases, it may be necessary to use this method to
access the matching substring instead of the caps["name"]
notation.
Examples
Get the substring that matched with a default of an empty string if the group didn’t participate in the match:
use regex::bytes::Regex;
let re = Regex::new(
r"[a-z]+(?:(?<numbers>[0-9]+)|(?<letters>[A-Z]+))",
).unwrap();
let caps = re.captures(b"abc123").unwrap();
let numbers = caps.name("numbers").map_or(&b""[..], |m| m.as_bytes());
let letters = caps.name("letters").map_or(&b""[..], |m| m.as_bytes());
assert_eq!(numbers, b"123");
assert_eq!(letters, b"");
sourcepub fn extract<const N: usize>(&self) -> (&'h [u8], [&'h [u8]; N])
pub fn extract<const N: usize>(&self) -> (&'h [u8], [&'h [u8]; N])
This is a convenience routine for extracting the substrings corresponding to matching capture groups.
This returns a tuple where the first element corresponds to the full substring of the haystack that matched the regex. The second element is an array of substrings, with each corresponding to the to the substring that matched for a particular capture group.
Panics
This panics if the number of possible matching groups in this
Captures
value is not fixed to N
in all circumstances.
More precisely, this routine only works when N
is equivalent to
Regex::static_captures_len
.
Stated more plainly, if the number of matching capture groups in a regex can vary from match to match, then this function always panics.
For example, (a)(b)|(c)
could produce two matching capture groups
or one matching capture group for any given match. Therefore, one
cannot use extract
with such a pattern.
But a pattern like (a)(b)|(c)(d)
can be used with extract
because
the number of capture groups in every match is always equivalent,
even if the capture indices in each match are not.
Example
use regex::bytes::Regex;
let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap();
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
let Some((full, [year, month, day])) =
re.captures(hay).map(|caps| caps.extract()) else { return };
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
assert_eq!(b"03", month);
assert_eq!(b"14", day);
Example: iteration
This example shows how to use this method when iterating over all
Captures
matches in a haystack.
use regex::bytes::Regex;
let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap();
let hay = b"1973-01-05, 1975-08-25 and 1980-10-18";
let mut dates: Vec<(&[u8], &[u8], &[u8])> = vec![];
for (_, [y, m, d]) in re.captures_iter(hay).map(|c| c.extract()) {
dates.push((y, m, d));
}
assert_eq!(dates, vec![
(&b"1973"[..], &b"01"[..], &b"05"[..]),
(&b"1975"[..], &b"08"[..], &b"25"[..]),
(&b"1980"[..], &b"10"[..], &b"18"[..]),
]);
Example: parsing different formats
This API is particularly useful when you need to extract a particular value that might occur in a different format. Consider, for example, an identifier that might be in double quotes or single quotes:
use regex::bytes::Regex;
let re = Regex::new(r#"id:(?:"([^"]+)"|'([^']+)')"#).unwrap();
let hay = br#"The first is id:"foo" and the second is id:'bar'."#;
let mut ids = vec![];
for (_, [id]) in re.captures_iter(hay).map(|c| c.extract()) {
ids.push(id);
}
assert_eq!(ids, vec![b"foo", b"bar"]);
sourcepub fn expand(&self, replacement: &[u8], dst: &mut Vec<u8>)
pub fn expand(&self, replacement: &[u8], dst: &mut Vec<u8>)
Expands all instances of $ref
in replacement
to the corresponding
capture group, and writes them to the dst
buffer given. A ref
can
be a capture group index or a name. If ref
doesn’t refer to a capture
group that participated in the match, then it is replaced with the
empty string.
Format
The format of the replacement string supports two different kinds of capture references: unbraced and braced.
For the unbraced format, the format supported is $ref
where name
can be any character in the class [0-9A-Za-z_]
. ref
is always
the longest possible parse. So for example, $1a
corresponds to the
capture group named 1a
and not the capture group at index 1
. If
ref
matches ^[0-9]+$
, then it is treated as a capture group index
itself and not a name.
For the braced format, the format supported is ${ref}
where ref
can
be any sequence of bytes except for }
. If no closing brace occurs,
then it is not considered a capture reference. As with the unbraced
format, if ref
matches ^[0-9]+$
, then it is treated as a capture
group index and not a name.
The braced format is useful for exerting precise control over the name
of the capture reference. For example, ${1}a
corresponds to the
capture group reference 1
followed by the letter a
, where as $1a
(as mentioned above) corresponds to the capture group reference 1a
.
The braced format is also useful for expressing capture group names
that use characters not supported by the unbraced format. For example,
${foo[bar].baz}
refers to the capture group named foo[bar].baz
.
If a capture group reference is found and it does not refer to a valid capture group, then it will be replaced with the empty string.
To write a literal $
, use $$
.
Example
use regex::bytes::Regex;
let re = Regex::new(
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
).unwrap();
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
let caps = re.captures(hay).unwrap();
let mut dst = vec![];
caps.expand(b"year=$year, month=$month, day=$day", &mut dst);
assert_eq!(dst, b"year=2010, month=03, day=14");
sourcepub fn iter<'c>(&'c self) -> SubCaptureMatches<'c, 'h> ⓘ
pub fn iter<'c>(&'c self) -> SubCaptureMatches<'c, 'h> ⓘ
Returns an iterator over all capture groups. This includes both matching and non-matching groups.
The iterator always yields at least one matching group: the first group
(at index 0
) with no name. Subsequent groups are returned in the order
of their opening parenthesis in the regex.
The elements yielded have type Option<Match<'h>>
, where a non-None
value is present if the capture group matches.
Example
use regex::bytes::Regex;
let re = Regex::new(r"(\w)(\d)?(\w)").unwrap();
let caps = re.captures(b"AZ").unwrap();
let mut it = caps.iter();
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), Some(&b"AZ"[..]));
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), Some(&b"A"[..]));
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), None);
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), Some(&b"Z"[..]));
assert_eq!(it.next(), None);
sourcepub fn len(&self) -> usize
pub fn len(&self) -> usize
Returns the total number of capture groups. This includes both matching and non-matching groups.
The length returned is always equivalent to the number of elements
yielded by Captures::iter
. Consequently, the length is always
greater than zero since every Captures
value always includes the
match for the entire regex.
Example
use regex::bytes::Regex;
let re = Regex::new(r"(\w)(\d)?(\w)").unwrap();
let caps = re.captures(b"AZ").unwrap();
assert_eq!(caps.len(), 4);
Trait Implementations§
source§impl<'h, 'n> Index<&'n str> for Captures<'h>
impl<'h, 'n> Index<&'n str> for Captures<'h>
Get a matching capture group’s haystack substring by name.
The haystack substring returned can’t outlive the Captures
object if this
method is used, because of how Index
is defined (normally a[i]
is part
of a
and can’t outlive it). To work around this limitation, do that, use
Captures::get
instead.
'h
is the lifetime of the matched haystack, but the lifetime of the
&str
returned by this implementation is the lifetime of the Captures
value itself.
'n
is the lifetime of the group name used to index the Captures
value.
Panics
If there is no matching group at the given name.
source§impl<'h> Index<usize> for Captures<'h>
impl<'h> Index<usize> for Captures<'h>
Get a matching capture group’s haystack substring by index.
The haystack substring returned can’t outlive the Captures
object if this
method is used, because of how Index
is defined (normally a[i]
is part
of a
and can’t outlive it). To work around this limitation, do that, use
Captures::get
instead.
'h
is the lifetime of the matched haystack, but the lifetime of the
&str
returned by this implementation is the lifetime of the Captures
value itself.
Panics
If there is no matching group at the given index.