Replacement patterns
To use a numbered capture group, surround the group with parentheses in the regular expression pattern. Use $number
, where number
is an integer starting at 1, to specify a specific, numbered group in a replacement pattern. For example, the grouped regular expression (\d)([a-z])
defines two groups: the first group contains a single decimal digit, and the second group contains a single character between a and z. The expression finds four matches in the following string: 1a 2b 3c 4d. The replacement string z$1
references the first group only, and converts the string to z1 z2 z3 z4.
For information about regular expressions that are used in replacement patterns, see Substitutions in regular expressions (.NET guide).
Regular expression examples
Here are some examples:
Purpose | Expression | Example |
---|---|---|
Match any single character (except a line break) | . | a.o matches "aro" in "around" and "abo" in "about" but not "acro" in "across". |
Match zero or more occurrences of the preceding expression (match as many characters as possible) | * | a*r matches "r" in "rack", "ar" in "ark", and "aar" in "aardvark" |
Match any character zero or more times (Wildcard *) | .* | c.*e matches "cke" in "racket", "comme" in "comment", and "code" in "code" |
Match one or more occurrences of the preceding expression (match as many characters as possible) | + | e.+d matches "eed" in "feeder" but not "ed". |
Match any character one or more times (Wildcard ?) | .+ | e.+e matches "eede" in "feeder" but not "ee". |
Match zero or more occurrences of the preceding expression (match as few characters as possible) | *? | e.*?e matches "ee" in "feeder" but not "eede". |
Match one or more occurrences of the preceding expression (match as few characters as possible) | +? | e.+?e matches "ente" and "erprise" in "enterprise", but not the whole word "enterprise". |
Anchor the match string to the beginning of a line or string | ^ | ^car matches the word "car" only when it appears at the beginning of a line. |
Anchor the match string to the end of a line | \r?$ | end\r?$ matches "end" only when it appears at the end of a line. |
Anchor the match string to the end of the file | $ | end$ matches "end" only when it appears at the end of the file. |
Match any single character in a set | [abc] | b[abc] matches "ba", "bb", and "bc". |
Match any character in a range of characters | [a-f] | be[n-t] matches "bet" in "between", "ben" in "beneath", and "bes" in "beside", but not "below". |
Capture and implicitly number the expression contained within parenthesis | () | ([a-z])X\1 matches "aXa"and "bXb", but not "aXb". "\1" refers to the first expression group "[a-z]". |
Invalidate a match | (?!abc) | real(?!ity) matches "real" in "realty" and "really" but not in "reality." It also finds the second "real" (but not the first "real") in "realityreal". |
Match any character that is not in a given set of characters | [^abc] | be[^n-t] matches "bef" in "before", "beh" in "behind", and "bel" in "below", but not "beneath". |
Match either the expression before or the one after the symbol. | | | (sponge\|mud) bath matches "sponge bath" and "mud bath." |
Escape the character following the backslash | \ | \^ matches the character ^. |
Specify the number of occurrences of the preceding character or group | {x}, where x is the number of occurrences | x(ab){2}x matches "xababx", and x(ab){2,3}x matches "xababx" and "xabababx" but not "xababababx". |
Match text in a Unicode character class. For more information about Unicode character classes, see Unicode Standard 5.2 Character Properties. |
\p{X}, where "X" is the Unicode number. | \p{Lu} matches "T" and "D" in "Thomas Doe". |
Match a word boundary | \b (Outside a character class \b specifies a word boundary, and inside a character class \b specifies a backspace.) |
\bin matches "in" in "inside" but not "pinto". |
Match a line break (that is, a carriage return followed by a new line). | \r?\n | End\r?\nBegin matches "End" and "Begin" only when "End" is the last string in a line and "Begin" is the first string in the next line. |
Match any alphanumeric character | \w | a\wd matches "add" and "a1d" but not "a d". |
Match any whitespace character. | (?([^\r\n])\s) | Public\sInterface matches the phrase "Public Interface". |
Match any numeric character | \d | \d matches and "3" in "3456", "2" in 23", and "1" in "1". |
Match a Unicode character | \uXXXX where XXXX specifies the Unicode character value. | \u0065 matches the character "e". |
Match an identifier | \b[_\w-[0-9]][_\w]*\b | Matches "type1" but not "&type1" or "#define". |
Match a string inside quotes | ((\".+?\")|('.+?')) | Matches any string inside single or double quotes. |
Match a hexadecimal number | \b0[xX]([0-9a-fA-F])\b | Matches "0xc67f" but not "0xc67fc67f". |
Match integers and decimals | \b[0-9]*\.*[0-9]+\b | Matches "1.333". |
Tip
In Windows operating systems, most lines end in "\r\n" (a carriage return followed by a new line). These characters aren't visible, but are present in the editor and are passed to the .NET regular expression service.
Substitutions are language elements that are recognized only within replacement patterns. They use a regular expression pattern to define all or part of the text that is to replace matched text in the input string. The replacement pattern can consist of one or more substitutions along with literal characters. Replacement patterns are provided to overloads of the Regex.Replace method that have a replacement
parameter and to the Match.Result method. The methods replace the matched pattern with the pattern that is defined by the replacement
parameter.
The .NET Framework defines the substitution elements listed in the following table.
Substitution | Description |
---|---|
$ number |
Includes the last substring matched by the capturing group that is identified by number, where number is a decimal value, in the replacement string. For more information, see Substituting a Numbered Group. |
${ name } |
Includes the last substring matched by the named group that is designated by (?< name> ) in the replacement string. For more information, see Substituting a Named Group. |
$$ |
Includes a single "$" literal in the replacement string. For more information, see Substituting a "$" Symbol. |
$& |
Includes a copy of the entire match in the replacement string. For more information, see Substituting the Entire Match. |
$` |
Includes all the text of the input string before the match in the replacement string. For more information, see Substituting the Text before the Match. |
$' |
Includes all the text of the input string after the match in the replacement string. For more information, see Substituting the Text after the Match. |
$+ |
Includes the last group captured in the replacement string. For more information, see Substituting the Last Captured Group. |
$_ |
Includes the entire input string in the replacement string. For more information, see Substituting the Entire Input String. |
Substitution Elements and Replacement Patterns
Substitutions are the only special constructs recognized in a replacement pattern. None of the other regular expression language elements, including character escapes and the period (.
), which matches any character, are supported. Similarly, substitution language elements are recognized only in replacement patterns and are never valid in regular expression patterns.
The only character that can appear either in a regular expression pattern or in a substitution is the $
character, although it has a different meaning in each context. In a regular expression pattern, $
is an anchor that matches the end of the string. In a replacement pattern, $
indicates the beginning of a substitution.
Note
For functionality similar to a replacement pattern within a regular expression, use a backreference. For more information about backreferences, see Backreference Constructs.
Substituting a Numbered Group
The $
number language element includes the last substring matched by the number capturing group in the replacement string, where number is the index of the capturing group. For example, the replacement pattern $1
indicates that the matched substring is to be replaced by the first captured group. For more information about numbered capturing groups, see Grouping Constructs.
All digits that follow $
are interpreted as belonging to the number group. If this is not your intent, you can substitute a named group instead. For example, you can use the replacement string ${1}1
instead of $11
to define the replacement string as the value of the first captured group along with the number "1". For more information, see Substituting a Named Group.
Capturing groups that are not explicitly assigned names using the (?<
name>)
syntax are numbered from left to right starting at one. Named groups are also numbered from left to right, starting at one greater than the index of the last unnamed group. For example, in the regular expression (\w)(?<digit>\d)
, the index of the digit
named group is 2.
If number does not specify a valid capturing group defined in the regular expression pattern, $
number is interpreted as a literal character sequence that is used to replace each match.
The following example uses the $
number substitution to strip the currency symbol from a decimal value. It removes currency symbols found at the beginning or end of a monetary value, and recognizes the two most common decimal separators ("." and ",").
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"\p{Sc}*(\s?\d+[.,]?\d*)\p{Sc}*";
string replacement = "$1";
string input = "$16.32 12.19 £16.29 €18.29 €18,29";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
// The example displays the following output:
// 16.32 12.19 16.29 18.29 18,29
The regular expression pattern \p{Sc}*(\s?\d+[.,]?\d*)\p{Sc}*
is defined as shown in the following table.
Pattern | Description |
---|---|
\p{Sc}* |
Match zero or more currency symbol characters. |
\s? |
Match zero or one white-space characters. |
\d+ |
Match one or more decimal digits. |
[.,]? |
Match zero or one period or comma. |
\d* |
Match zero or more decimal digits. |
(\s?\d+[.,]?\d*) |
Match a white space followed by one or more decimal digits, followed by zero or one period or comma, followed by zero or more decimal digits. This is the first capturing group. Because the replacement pattern is $1 , the call to the Regex.Replace method replaces the entire matched substring with this captured group. |
Substituting a Named Group
The ${
name}
language element substitutes the last substring matched by the name capturing group, where name is the name of a capturing group defined by the (?<
name>)
language element. For more information about named capturing groups, see Grouping Constructs.
If name doesn't specify a valid named capturing group defined in the regular expression pattern but consists of digits, ${
name}
is interpreted as a numbered group.
If name specifies neither a valid named capturing group nor a valid numbered capturing group defined in the regular expression pattern, ${
name}
is interpreted as a literal character sequence that is used to replace each match.
The following example uses the ${
name}
substitution to strip the currency symbol from a decimal value. It removes currency symbols found at the beginning or end of a monetary value, and recognizes the two most common decimal separators ("." and ",").
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"\p{Sc}*(?<amount>\s?\d+[.,]?\d*)\p{Sc}*";
string replacement = "${amount}";
string input = "$16.32 12.19 £16.29 €18.29 €18,29";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
// The example displays the following output:
// 16.32 12.19 16.29 18.29 18,29
The regular expression pattern \p{Sc}*(?<amount>\s?\d[.,]?\d*)\p{Sc}*
is defined as shown in the following table.
Pattern | Description |
---|---|
\p{Sc}* |
Match zero or more currency symbol characters. |
\s? |
Match zero or one white-space characters. |
\d+ |
Match one or more decimal digits. |
[.,]? |
Match zero or one period or comma. |
\d* |
Match zero or more decimal digits. |
(?<amount>\s?\d[.,]?\d*) |
Match a white space, followed by one or more decimal digits, followed by zero or one period or comma, followed by zero or more decimal digits. This is the capturing group named amount . Because the replacement pattern is ${amount} , the call to the Regex.Replace method replaces the entire matched substring with this captured group. |
Substituting a "$" Character
The $$
substitution inserts a literal "$" character in the replaced string.
The following example uses the NumberFormatInfo object to determine the current culture's currency symbol and its placement in a currency string. It then builds both a regular expression pattern and a replacement pattern dynamically. If the example is run on a computer whose current culture is en-US, it generates the regular expression pattern \b(\d+)(\.(\d+))?
and the replacement pattern $$ $1$2
. The replacement pattern replaces the matched text with a currency symbol and a space followed by the first and second captured groups.
using System;
using System.Globalization;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
// Define array of decimal values.
string[] values= { "16.35", "19.72", "1234", "0.99"};
// Determine whether currency precedes (True) or follows (False) number.
bool precedes = NumberFormatInfo.CurrentInfo.CurrencyPositivePattern % 2 == 0;
// Get decimal separator.
string cSeparator = NumberFormatInfo.CurrentInfo.CurrencyDecimalSeparator;
// Get currency symbol.
string symbol = NumberFormatInfo.CurrentInfo.CurrencySymbol;
// If symbol is a "$", add an extra "$".
if (symbol == "$") symbol = "$$";
// Define regular expression pattern and replacement string.
string pattern = @"\b(\d+)(" + cSeparator + @"(\d+))?";
string replacement = "$1$2";
replacement = precedes ? symbol + " " + replacement : replacement + " " + symbol;
foreach (string value in values)
Console.WriteLine("{0} --> {1}", value, Regex.Replace(value, pattern, replacement));
}
}
// The example displays the following output:
// 16.35 --> $ 16.35
// 19.72 --> $ 19.72
// 1234 --> $ 1234
// 0.99 --> $ 0.99
The regular expression pattern \b(\d+)(\.(\d+))?
is defined as shown in the following table.
Pattern | Description |
---|---|
\b |
Start the match at the beginning of a word boundary. |
(\d+) |
Match one or more decimal digits. This is the first capturing group. |
\. |
Match a period (the decimal separator). |
(\d+) |
Match one or more decimal digits. This is the third capturing group. |
(\.(\d+))? |
Match zero or one occurrence of a period followed by one or more decimal digits. This is the second capturing group. |
Substituting the Entire Match
The $&
substitution includes the entire match in the replacement string. Often, it is used to add a substring to the beginning or end of the matched string. For example, the ($&)
replacement pattern adds parentheses to the beginning and end of each match. If there is no match, the $&
substitution has no effect.
The following example uses the $&
substitution to add quotation marks at the beginning and end of book titles stored in a string array.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"^(\w+\s?)+$";
string[] titles = { "A Tale of Two Cities",
"The Hound of the Baskervilles",
"The Protestant Ethic and the Spirit of Capitalism",
"The Origin of Species" };
string replacement = "\"$&\"";
foreach (string title in titles)
Console.WriteLine(Regex.Replace(title, pattern, replacement));
}
}
// The example displays the following output:
// "A Tale of Two Cities"
// "The Hound of the Baskervilles"
// "The Protestant Ethic and the Spirit of Capitalism"
// "The Origin of Species"
The regular expression pattern ^(\w+\s?)+$
is defined as shown in the following table.
Pattern | Description |
---|---|
^ |
Start the match at the beginning of the input string. |
(\w+\s?)+ |
Match the pattern of one or more word characters followed by zero or one white-space characters one or more times. |
$ |
Match the end of the input string. |
The "$&"
replacement pattern adds a literal quotation mark to the beginning and end of each match.
Substituting the Text Before the Match
The $`
substitution replaces the matched string with the entire input string before the match. That is, it duplicates the input string up to the match while removing the matched text. Any text that follows the matched text is unchanged in the result string. If there are multiple matches in an input string, the replacement text is derived from the original input string, rather than from the string in which text has been replaced by earlier matches. (The example provides an illustration.) If there is no match, the $`
substitution has no effect.
The following example uses the regular expression pattern \d+
to match a sequence of one or more decimal digits in the input string. The replacement string $`
replaces these digits with the text that precedes the match.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "aa1bb2cc3dd4ee5";
string pattern = @"\d+";
string substitution = "$`";
Console.WriteLine("Matches:");
foreach (Match match in Regex.Matches(input, pattern))
Console.WriteLine(" {0} at position {1}", match.Value, match.Index);
Console.WriteLine("Input string: {0}", input);
Console.WriteLine("Output string: " +
Regex.Replace(input, pattern, substitution));
}
}
// The example displays the following output:
// Matches:
// 1 at position 2
// 2 at position 5
// 3 at position 8
// 4 at position 11
// 5 at position 14
// Input string: aa1bb2cc3dd4ee5
// Output string: aaaabbaa1bbccaa1bb2ccddaa1bb2cc3ddeeaa1bb2cc3dd4ee
In this example, the input string "aa1bb2cc3dd4ee5"
contains five matches. The following table illustrates how the $`
substitution causes the regular expression engine to replace each match in the input string. Inserted text is shown in bold in the results column.
Match | Position | String before match | Result string |
---|---|---|---|
1 | 2 | aa | aaaabb2cc3dd4ee5 |
2 | 5 | aa1bb | aaaabbaa1bbcc3dd4ee5 |
3 | 8 | aa1bb2cc | aaaabbaa1bbccaa1bb2ccdd4ee5 |
4 | 11 | aa1bb2cc3dd | aaaabbaa1bbccaa1bb2ccddaa1bb2cc3ddee5 |
5 | 14 | aa1bb2cc3dd4ee | aaaabbaa1bbccaa1bb2ccddaa1bb2cc3ddeeaa1bb2cc3dd4ee |
Substituting the Text After the Match
The $'
substitution replaces the matched string with the entire input string after the match. That is, it duplicates the input string after the match while removing the matched text. Any text that precedes the matched text is unchanged in the result string. If there is no match, the $'
substitution has no effect.
The following example uses the regular expression pattern \d+
to match a sequence of one or more decimal digits in the input string. The replacement string $'
replaces these digits with the text that follows the match.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "aa1bb2cc3dd4ee5";
string pattern = @"\d+";
string substitution = "$'";
Console.WriteLine("Matches:");
foreach (Match match in Regex.Matches(input, pattern))
Console.WriteLine(" {0} at position {1}", match.Value, match.Index);
Console.WriteLine("Input string: {0}", input);
Console.WriteLine("Output string: " +
Regex.Replace(input, pattern, substitution));
}
}
// The example displays the following output:
// Matches:
// 1 at position 2
// 2 at position 5
// 3 at position 8
// 4 at position 11
// 5 at position 14
// Input string: aa1bb2cc3dd4ee5
// Output string: aabb2cc3dd4ee5bbcc3dd4ee5ccdd4ee5ddee5ee
In this example, the input string "aa1bb2cc3dd4ee5"
contains five matches. The following table illustrates how the $'
substitution causes the regular expression engine to replace each match in the input string. Inserted text is shown in bold in the results column.
Match | Position | String after match | Result string |
---|---|---|---|
1 | 2 | bb2cc3dd4ee5 | aabb2cc3dd4ee5bb2cc3dd4ee5 |
2 | 5 | cc3dd4ee5 | aabb2cc3dd4ee5bbcc3dd4ee5cc3dd4ee5 |
3 | 8 | dd4ee5 | aabb2cc3dd4ee5bbcc3dd4ee5ccdd4ee5dd4ee5 |
4 | 11 | ee5 | aabb2cc3dd4ee5bbcc3dd4ee5ccdd4ee5ddee5ee5 |
5 | 14 | String.Empty | aabb2cc3dd4ee5bbcc3dd4ee5ccdd4ee5ddee5ee |
Substituting the Last Captured Group
The $+
substitution replaces the matched string with the last captured group. If there are no captured groups or if the value of the last captured group is String.Empty, the $+
substitution has no effect.
The following example identifies duplicate words in a string and uses the $+
substitution to replace them with a single occurrence of the word. The RegexOptions.IgnoreCase option is used to ensure that words that differ in case but that are otherwise identical are considered duplicates.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"\b(\w+)\s\1\b";
string substitution = "$+";
string input = "The the dog jumped over the fence fence.";
Console.WriteLine(Regex.Replace(input, pattern, substitution,
RegexOptions.IgnoreCase));
}
}
// The example displays the following output:
// The dog jumped over the fence.
The regular expression pattern \b(\w+)\s\1\b
is defined as shown in the following table.
Pattern | Description |
---|---|
\b |
Begin the match at a word boundary. |
(\w+) |
Match one or more word characters. This is the first capturing group. |
\s |
Match a white-space character. |
\1 |
Match the first captured group. |
\b |
End the match at a word boundary. |
Substituting the Entire Input String
The $_
substitution replaces the matched string with the entire input string. That is, it removes the matched text and replaces it with the entire string, including the matched text.
The following example matches one or more decimal digits in the input string. It uses the $_
substitution to replace them with the entire input string.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "ABC123DEF456";
string pattern = @"\d+";
string substitution = "$_";
Console.WriteLine("Original string: {0}", input);
Console.WriteLine("String with substitution: {0}",
Regex.Replace(input, pattern, substitution));
}
}
// The example displays the following output:
// Original string: ABC123DEF456
// String with substitution: ABCABC123DEF456DEFABC123DEF456
In this example, the input string "ABC123DEF456"
contains two matches. The following table illustrates how the $_
substitution causes the regular expression engine to replace each match in the input string. Inserted text is shown in bold in the results column.
Match | Position | Match | Result string |
---|---|---|---|
1 | 3 | 123 | ABCABC123DEF456DEF456 |
2 | 5 | 456 | ABCABC123DEF456DEFABC123DEF456 |
A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs. For a brief introduction, see .NET Regular Expressions.
Each section in this quick reference lists a particular category of characters, operators, and constructs that you can use to define regular expressions:
Character escapes
Character classes
Anchors
Grouping constructs
Quantifiers
Backreference constructs
Alternation constructs
Substitutions
Regular expression options
Miscellaneous constructs
We’ve also provided this information in two formats that you can download and print for easy reference:
Download in Word (.docx) format
Download in PDF (.pdf) format
Character Escapes
The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally. For more information, see Character Escapes.
Escaped character | Description | Pattern | Matches |
---|---|---|---|
\a |
Matches a bell character, \u0007. | \a |
"\u0007" in "Error!" + '\u0007' |
\b |
In a character class, matches a backspace, \u0008. | [\b]{3,} |
"\b\b\b\b" in "\b\b\b\b" |
\t |
Matches a tab, \u0009. | (\w+)\t |
"item1\t", "item2\t" in "item1\titem2\t" |
\r |
Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n .) |
\r\n(\w+) |
"\r\nThese" in "\r\nThese are\ntwo lines." |
\v |
Matches a vertical tab, \u000B. | [\v]{2,} |
"\v\v\v" in "\v\v\v" |
\f |
Matches a form feed, \u000C. | [\f]{2,} |
"\f\f\f" in "\f\f\f" |
\n |
Matches a new line, \u000A. | \r\n(\w+) |
"\r\nThese" in "\r\nThese are\ntwo lines." |
\e |
Matches an escape, \u001B. | \e |
"\x001B" in "\x001B" |
\ nnn |
Uses octal representation to specify a character (nnn consists of two or three digits). | \w\040\w |
"a b", "c d" in "a bc d" |
\x nn |
Uses hexadecimal representation to specify a character (nn consists of exactly two digits). | \w\x20\w |
"a b", "c d" in "a bc d" |
\c X\c x |
Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character. | \cC |
"\x0003" in "\x0003" (Ctrl-C) |
\u nnnn |
Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). | \w\u0020\w |
"a b", "c d" in "a bc d" |
\ |
When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A , and \. is the same as \x2E . This allows the regular expression engine to disambiguate language elements (such as * or ?) and character literals (represented by \* or \? ). |
\d+[\+-x\*]\d+ |
"2+2" and "3*9" in "(2+2) * 3*9" |
Character Classes
A character class matches any one of a set of characters. Character classes include the language elements listed in the following table. For more information, see Character Classes.
Character class | Description | Pattern | Matches |
---|---|---|---|
[ character_group ] |
Matches any single character in character_group. By default, the match is case-sensitive. | [ae] |
"a" in "gray" "a", "e" in "lane" |
[^ character_group ] |
Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive. | [^aei] |
"r", "g", "n" in "reign" |
[ first - last ] |
Character range: Matches any single character in the range from first to last. | [A-Z] |
"A", "B" in "AB123" |
. |
Wildcard: Matches any single character except \n. To match a literal period character (. or \u002E ), you must precede it with the escape character (\. ). |
a.e |
"ave" in "nave" "ate" in "water" |
\p{ name } |
Matches any single character in the Unicode general category or named block specified by name. | \p{Lu} \p{IsCyrillic} |
"C", "L" in "City Lights" "Д", "Ж" in "ДЖem" |
\P{ name } |
Matches any single character that is not in the Unicode general category or named block specified by name. | \P{Lu} \P{IsCyrillic} |
"i", "t", "y" in "City" "e", "m" in "ДЖem" |
\w |
Matches any word character. | \w |
"I", "D", "A", "1", "3" in "ID A1.3" |
\W |
Matches any non-word character. | \W |
" ", "." in "ID A1.3" |
\s |
Matches any white-space character. | \w\s |
"D " in "ID A1.3" |
\S |
Matches any non-white-space character. | \s\S |
" _" in "int __ctr" |
\d |
Matches any decimal digit. | \d |
"4" in "4 = IV" |
\D |
Matches any character other than a decimal digit. | \D |
" ", "=", " ", "I", "V" in "4 = IV" |
Anchors
Anchors, or atomic zero-width assertions, cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume characters. The metacharacters listed in the following table are anchors. For more information, see Anchors.
Assertion | Description | Pattern | Matches |
---|---|---|---|
^ |
By default, the match must start at the beginning of the string; in multiline mode, it must start at the beginning of the line. | ^\d{3} |
"901" in "901-333-" |
$ |
By default, the match must occur at the end of the string or before \n at the end of the string; in multiline mode, it must occur before the end of the line or before \n at the end of the line. |
-\d{3}$ |
"-333" in "-901-333" |
\A |
The match must occur at the start of the string. | \A\d{3} |
"901" in "901-333-" |
\Z |
The match must occur at the end of the string or before \n at the end of the string. |
-\d{3}\Z |
"-333" in "-901-333" |
\z |
The match must occur at the end of the string. | -\d{3}\z |
"-333" in "-901-333" |
\G |
The match must occur at the point where the previous match ended. | \G\(\d\) |
"(1)", "(3)", "(5)" in "(1)(3)(5)[7](9)" |
\b |
The match must occur on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character. |
\b\w+\s\w+\b |
"them theme", "them them" in "them theme them them" |
\B |
The match must not occur on a \b boundary. |
\Bend\w*\b |
"ends", "ender" in "end sends endure lender" |
Grouping Constructs
Grouping constructs delineate subexpressions of a regular expression and typically capture substrings of an input string. Grouping constructs include the language elements listed in the following table. For more information, see Grouping Constructs.
Grouping construct | Description | Pattern | Matches |
---|---|---|---|
( subexpression ) |
Captures the matched subexpression and assigns it a one-based ordinal number. | (\w)\1 |
"ee" in "deep" |
(?< name > subexpression ) |
Captures the matched subexpression into a named group. | (?<double>\w)\k<double> |
"ee" in "deep" |
(?< name1 - name2 > subexpression ) |
Defines a balancing group definition. For more information, see the "Balancing Group Definition" section in Grouping Constructs. | (((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$ |
"((1-3)*(3-1))" in "3+2^((1-3)*(3-1))" |
(?: subexpression ) |
Defines a noncapturing group. | Write(?:Line)? |
"WriteLine" in "Console.WriteLine()" "Write" in "Console.Write(value)" |
(?imnsx-imnsx: subexpression ) |
Applies or disables the specified options within subexpression. For more information, see Regular Expression Options. | A\d{2}(?i:\w+)\b |
"A12xl", "A12XL" in "A12xl A12XL a12xl" |
(?= subexpression ) |
Zero-width positive lookahead assertion. | \w+(?=\.) |
"is", "ran", and "out" in "He is. The dog ran. The sun is out." |
(?! subexpression ) |
Zero-width negative lookahead assertion. | \b(?!un)\w+\b |
"sure", "used" in "unsure sure unity used" |
(?<= subexpression ) |
Zero-width positive lookbehind assertion. | (?<=19)\d{2}\b |
"99", "50", "05" in "1851 1999 1950 1905 2003" |
(?<! subexpression ) |
Zero-width negative lookbehind assertion. | (?<!19)\d{2}\b |
"51", "03" in "1851 1999 1950 1905 2003" |
(?> subexpression ) |
Nonbacktracking (or "greedy") subexpression. | [13579](?>A+B+) |
"1ABB", "3ABB", and "5AB" in "1ABB 3ABBC 5AB 5AC" |
Quantifiers
A quantifier specifies how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur. Quantifiers include the language elements listed in the following table. For more information, see Quantifiers.
Quantifier | Description | Pattern | Matches |
---|---|---|---|
* |
Matches the previous element zero or more times. | \d*\.\d |
".0", "19.9", "219.9" |
+ |
Matches the previous element one or more times. | "be+" |
"bee" in "been", "be" in "bent" |
? |
Matches the previous element zero or one time. | "rai?n" |
"ran", "rain" |
{ n } |
Matches the previous element exactly n times. | ",\d{3}" |
",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210" |
{ n ,} |
Matches the previous element at least n times. | "\d{2,}" |
"166", "29", "1930" |
{ n , m } |
Matches the previous element at least n times, but no more than m times. | "\d{3,5}" |
"166", "17668" "19302" in "193024" |
*? |
Matches the previous element zero or more times, but as few times as possible. | \d*?\.\d |
".0", "19.9", "219.9" |
+? |
Matches the previous element one or more times, but as few times as possible. | "be+?" |
"be" in "been", "be" in "bent" |
?? |
Matches the previous element zero or one time, but as few times as possible. | "rai??n" |
"ran", "rain" |
{ n }? |
Matches the preceding element exactly n times. | ",\d{3}?" |
",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210" |
{ n ,}? |
Matches the previous element at least n times, but as few times as possible. | "\d{2,}?" |
"166", "29", "1930" |
{ n , m }? |
Matches the previous element between n and m times, but as few times as possible. | "\d{3,5}?" |
"166", "17668" "193", "024" in "193024" |
Backreference Constructs
A backreference allows a previously matched subexpression to be identified subsequently in the same regular expression. The following table lists the backreference constructs supported by regular expressions in .NET. For more information, see Backreference Constructs.
Backreference construct | Description | Pattern | Matches |
---|---|---|---|
\ number |
Backreference. Matches the value of a numbered subexpression. | (\w)\1 |
"ee" in "seek" |
\k< name > |
Named backreference. Matches the value of a named expression. | (?<char>\w)\k<char> |
"ee" in "seek" |
Alternation Constructs
Alternation constructs modify a regular expression to enable either/or matching. These constructs include the language elements listed in the following table. For more information, see Alternation Constructs.
Alternation construct | Description | Pattern | Matches |
---|---|---|---|
| |
Matches any one element separated by the vertical bar (|) character. | th(e|is|at) |
"the", "this" in "this is the day. " |
(?( expression ) yes | no ) |
Matches yes if the regular expression pattern designated by expression matches; otherwise, matches the optional no part. expression is interpreted as a zero-width assertion. | (?(A)A\d{2}\b|\b\d{3}\b) |
"A10", "910" in "A10 C103 910" |
(?( name ) yes | no ) |
Matches yes if name, a named or numbered capturing group, has a match; otherwise, matches the optional no. | (?<quoted>")?(?(quoted).+?"|\S+\s) |
Dogs.jpg, "Yiska playing.jpg" in "Dogs.jpg "Yiska playing.jpg"" |
Substitutions
Substitutions are regular expression language elements that are supported in replacement patterns. For more information, see Substitutions. The metacharacters listed in the following table are atomic zero-width assertions.
Character | Description | Pattern | Replacement pattern | Input string | Result string |
---|---|---|---|---|---|
$ number |
Substitutes the substring matched by group number. | \b(\w+)(\s)(\w+)\b |
$3$2$1 |
"one two" | "two one" |
${ name } |
Substitutes the substring matched by the named group name. | \b(?<word1>\w+)(\s)(?<word2>\w+)\b |
${word2} ${word1} |
"one two" | "two one" |
$$ |
Substitutes a literal "$". | \b(\d+)\s?USD |
$$$1 |
"103 USD" | "$103" |
$& |
Substitutes a copy of the whole match. | \$?\d*\.?\d+ |
**$&** |
"$1.30" | "**$1.30**" |
$` |
Substitutes all the text of the input string before the match. | B+ |
$` |
"AABBCC" | "AAAACC" |
$' |
Substitutes all the text of the input string after the match. | B+ |
$' |
"AABBCC" | "AACCCC" |
$+ |
Substitutes the last group that was captured. | B+(C+) |
$+ |
"AABBCCDD" | "AACCDD" |
$_ |
Substitutes the entire input string. | B+ |
$_ |
"AABBCC" | "AAAABBCCCC" |
Regular Expression Options
You can specify options that control how the regular expression engine interprets a regular expression pattern. Many of these options can be specified either inline (in the regular expression pattern) or as one or more RegexOptions constants. This quick reference lists only inline options. For more information about inline and RegexOptions options, see the article Regular Expression Options.
You can specify an inline option in two ways:
By using the miscellaneous construct
(?imnsx-imnsx)
, where a minus sign (-) before an option or set of options turns those options off. For example,(?i-mn)
turns case-insensitive matching (i
) on, turns multiline mode (m
) off, and turns unnamed group captures (n
) off. The option applies to the regular expression pattern from the point at which the option is defined, and is effective either to the end of the pattern or to the point where another construct reverses the option.By using the grouping construct
(?imnsx-imnsx:
subexpression)
, which defines options for the specified group only.
The .NET regular expression engine supports the following inline options.
Option | Description | Pattern | Matches |
---|---|---|---|
i |
Use case-insensitive matching. | \b(?i)a(?-i)a\w+\b |
"aardvark", "aaaAuto" in "aardvark AAAuto aaaAuto Adam breakfast" |
m |
Use multiline mode. ^ and $ match the beginning and end of a line, instead of the beginning and end of a string. |
For an example, see the "Multiline Mode" section in Regular Expression Options. | |
n |
Do not capture unnamed groups. | For an example, see the "Explicit Captures Only" section in Regular Expression Options. | |
s |
Use single-line mode. | For an example, see the "Single-line Mode" section in Regular Expression Options. | |
x |
Ignore unescaped white space in the regular expression pattern. | \b(?x) \d+ \s \w+ |
"1 aardvark", "2 cats" in "1 aardvark 2 cats IV centurions" |
Miscellaneous Constructs
Miscellaneous constructs either modify a regular expression pattern or provide information about it. The following table lists the miscellaneous constructs supported by .NET. For more information, see Miscellaneous Constructs.
Construct | Definition | Example |
---|---|---|
(?imnsx-imnsx) |
Sets or disables options such as case insensitivity in the middle of a pattern.For more information, see Regular Expression Options. | \bA(?i)b\w+\b matches "ABA", "Able" in "ABA Able Act" |
(?# comment ) |
Inline comment. The comment ends at the first closing parenthesis. | \bA(?#Matches words starting with A)\w+\b |
# [to end of line] |
X-mode comment. The comment starts at an unescaped # and continues to the end of the line. |
(?x)\bA\w+\b#Matches words starting with A |