ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Chapter 3
Introduction to Regular Expressions
1
Dr. Hadeel Alazzam
Scripting Programming
1
2
? Regular Expression
? Commands in use
? grep and egrep
? Regular Expression Metacharacters
? Grouping
? Brackets and Character Classes
? Back References
? Quantifiers
? Anchors and Word Boundaries
? Practical Examples
Outline
3
RegularExpression
? Regular expressions (regex) are a powerful method for
describing a text pattern to be matched by various tools.
? There is only one place in bash where regular
expressions are valid, using the =~ comparison in the [[
compound command, as in an if statement.
? Regular expressions are a crucial part of the larger toolkit
for commands like grep, awk, and sed in particular.
4
RegularExpressionvs. PatternMatching
? Pattern matching is used by the shell commands such
as the ls command.
? Regular expressions are used to search for strings of
text in a file by using commands, such as the grep
command.
? The use of regular expressions is generally associated
with text processing.
5
CommandsinUse
? grep: The grep command searches the content of the files for a given
pattern and prints any line where the pattern is matched.
? To use grep, you need to provide it with a pattern and one or more
filenames (or piped data).
? Common command options:
? -c: Count the number of lines that match the pattern.
? -E: Enable extended regular expressions.
? -f: Read the search pattern from a provided file. A file can contain
more than one pattern, with each line containing a single pattern.
? -i: Ignore character case.
? -l: Print only the filename and path where the pattern was found.
? -n: Print the line number of the file where the pattern was found.
? -p: Enable the Perl regular expression engine.
? -R, -r: Recursively search subdirectories.
6
CommandsinUse
? In general, grep is used like this:
? grep options pattern filenames
? To search the /home directory and all subdirectories for files containing
the word password, regardless of uppercase/lowercase distinctions:
7
grepandegrep
? The grep command supports some variations, notably extended syntax
for the regex patterns
? There are three ways to tell grep that you want special meaning on
certain characters:
1. by preceding those characters with a backslash.
2. by telling grep that you want the special syntax (without the need for
a backslash) by using the -E option when you invoke grep.
3. by using the command named egrep, which is a script that simply
invokes grep as grep ¨CE so you don¡¯t have to.
? The only characters that are affected by the extended syntax are? + { | (
and ).
Regular Expression Metacharacters
? Regular expressions are patterns that are created
using a series of characters and metacharacters.
? Metacharacters such as the questions mark (?) and
asterisk (*) have special meaning beyond their literal
meanings in regex.
? The 7 lines of frost.txt file will be used in the next
slides examples.
Regular Expression Metacharacters
? 1 Two roads diverged in a yellow wood,
? 2 And sorry I could not travel both
? 3 And be one traveler, long I stood
? 4 And looked down one as far as I could
? 5 To where it bent in the undergrowth;
? 6
? 7 Excerpt from The Road Not Taken by
Robert Frost
10
Regular Expression Metacharacters
? The ¡°.¡± Metacharacter:
? The period (.) represents a single wildcard character.
? It will match on any single character except for a newline.
? If you want to treat this metacharacter as a period character rather
than a wildcard, precede it with a backslash (.) to escape its special
meaning.
? If we try to match on the
pattern T.o, the first line of
the frost.txt file is returned
because it contains the
word Two
? Regex patterns are also case
sensitive, which is why line
3 of the file is not returned
even though it contains the
string too
11
Regular Expression Metacharacters
? The ¡°?¡± Metacharacter:
? The question mark (?) character makes any item that precedes it
optional.
? It matches it zero or one time.
? This pattern will match on any three-character sequence that begins with T and ends
with o as well as the two-character sequence To.
? Note that we are using egrep here.
? We could have used grep ¨CE,
? or we could have used ¡°plain¡± grep with a slightly different pattern: T.?o, putting
the backslash on the question mark to give it the extended meaning.
12
Regular Expression Metacharacters
? The ¡°*¡± Metacharacter:
? The asterisk (*) is a special character that matches the preceding item
zero or more times.
? It is similar to ?, the main difference being that the previous item may
appear more than once.
? The .* in the preceding pattern allows any number of any character to
appear between the T and o.
? Thus, the last line also matches because it contains the pattern The
Ro.
13
Regular Expression Metacharacters
? The ¡°+¡± Metacharacter:
? The plus sign (+) metacharacter is the same as the * except it requires
the preceding item to appear at least once.
? The preceding pattern specifies one or more of any character to
appear in between the T and o.
? The first line of text matches because of Two ¡ª the w is one character
between the T and the o.
? The second line doesn¡¯t match the To, as in the previous example;
rather, the pattern matches a much larger string ¡ª all the way to the o
in undergrowth.
? The last line also matches because it contains the pattern The Ro.
14
Grouping
? We can use parentheses to group characters.
? Among other things, this allows us to treat the characters appearing inside
the parentheses as a single item that we can later reference.
? Here, we use parentheses and the Boolean OR operator (|) to create a
pattern that will match on line 3.
? Line 3 as written has the word traveler in it, but this pattern would match
even if traveler was replaced by the word stranger.
15
Brackets and Character Classes
? The square brackets, [ ] , are used to define character classes and lists of
acceptable characters.
? Using this construct, you can list exactly which characters are matched at
this position in the pattern.
? This is particularly useful when trying to perform user-input validation.
? As shorthand, you can specify ranges with a dash, such as [a-j].
? These ranges are in your locale¡¯s collating sequence and alphabet.
? The pattern [a-j] will match one of the letters a through j.
16
Brackets and Character Classes
? Table 3-1 provides a list of common examples when using character
classes and ranges.
? Be careful when defining a range for digits; the range can at most go from 0 to 9.
For example, the pattern [1-475] does not match on numbers between 1 and 475;
it matches on any one of the digits (characters) in the range 1¨C4 or the character 7
or the character 5.
17
Brackets and Character Classes
? There are also predefined character classes known as shortcuts.
? These can be used to indicate common character classes such as
numbers or letters.
18
Brackets and Character Classes
? Note that the shortcuts are not supported by egrep.
? In order to use them, you must use grep with the -p option.
? That option enables the Perl regular expression engine to support the
shortcuts.
? Note: -p (small letter)
19
Brackets and Character Classes
? Other character classes (are valid only within
the bracket syntax, as shown in Table 3-3.
? They match a single character, so if you need
to match many in a row, use the * or + to get
the repetition you need.
? To use one of these classes, it has to be inside
the brackets, so you end up with two sets of
brackets.
? This will match any line with an X followed by
any uppercase letter or digit. It would match
these lines:
20
Brackets and Character Classes
21
Back References
? Regex back references are one of the most powerful and often confusing
regex operations.
? Consider the following file, tags.txt:
? Suppose you want to write a regular expression that will extract any line
that contains a matching pair of complete HTML tags.
? The start tag has an HTML tag name; the ending tag has the same tag
name but with a leading slash. <div> and </div> are a matching pair.
You can search for these by writing a lengthy regex that contains all
possible HTML tag values, or you can focus on the format of an
? HTML tag and use a regex back reference, as follows:
22
Back References
? In this example, the back reference is the 1 appearing in the latter part of
the regular expression.
? It is referring back to the expression enclosed in the first set of
parentheses, [A-Za-z]*, which has two parts.
? The letter range in brackets denotes a choice of any letter, uppercase or
lowercase.
? The * that follows it means to repeat that zero or more times.
? Therefore, the 1 refers to whatever was matched by that pattern in
parentheses.
? If [A-Za-z]* matches div, then the 1 also refers to the pattern div.
23
Back References
? You can have more than one back reference in an expression and refer
to each with a 1 or 2 or 3 depending on its order in the regular
expression
? A 1 refers to the first set of parentheses, 2 to the second, and so on
? Note that the parentheses are metacharacters; they have a special
meaning.
? If you just want to match a literal parenthesis, you need to escape its
special meaning by preceding it with a backslash, as in sin([0-9.]*) to
match expressions like sin(6.2) or sin(3.14159).
24
Quantifiers
? Quantifiers specify the number of times an item must appear in a string.
? Quantifiers are defined by curly braces { }.
? For example, the pattern T{5} means that the letter T must appear
consecutively exactly five times.
? The pattern T{3,6} means that the letter T must appear consecutively
three to six times.
? The pattern T{5,} means that the letter T must appear five or more times.
25
Anchors and Word Boundaries
? You can use anchors to specify that a pattern must exist at the beginning
or the end of a string.
? The caret (^) character is used to anchor a pattern to the beginning of a
string.
? For example, ^[1-5] means that a matching string must start with one of
the digits 1 through 5, as the first character on the line.
? The $ character is used to anchor a pattern to the end of a string or line.
? For example, [1-5]$ means that a string must end with one of the digits 1
through 5.
? In addition, you can use b to identify a word boundary (i.e., a space).
? The pattern b[1-5]b will match on any of the digits 1 through 5, where
the digit appears as its own word.
26
Practical Examples
End
27
Dr. Aryaf Al-adwan, Autonomous Systems Dept 27

More Related Content

Similar to Chapter 3: Introduction to Regular Expression (20)

Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
Mukesh Tekwani
?
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for Patterns
Keith Wright
?
Regular Expressions in Stata
Regular Expressions in StataRegular Expressions in Stata
Regular Expressions in Stata
John Ong'ala Lunalo
?
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
Sandy Smith
?
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
mussawir20
?
Perl_Part4
Perl_Part4Perl_Part4
Perl_Part4
Frank Booth
?
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
Logan Palanisamy
?
Regular expressions in php programming language.pptx
Regular expressions in php programming language.pptxRegular expressions in php programming language.pptx
Regular expressions in php programming language.pptx
NikhilVij6
?
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
Ahmed El-Arabawy
?
Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017
Sandy Smith
?
Lecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdfLecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdf
Saravana Kumar
?
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
jaychoudhary37
?
Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015
Sandy Smith
?
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdfpython1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
rohithzach
?
Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15
Sandy Smith
?
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
Prof. Wim Van Criekinge
?
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Raj Gupta
?
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
Andrei Zmievski
?
Patterns
PatternsPatterns
Patterns
Gayathri91098
?
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
wayn
?
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
Mukesh Tekwani
?
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
Sandy Smith
?
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
mussawir20
?
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
Logan Palanisamy
?
Regular expressions in php programming language.pptx
Regular expressions in php programming language.pptxRegular expressions in php programming language.pptx
Regular expressions in php programming language.pptx
NikhilVij6
?
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
Ahmed El-Arabawy
?
Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017
Sandy Smith
?
Lecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdfLecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdf
Saravana Kumar
?
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
jaychoudhary37
?
Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015
Sandy Smith
?
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdfpython1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
rohithzach
?
Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15
Sandy Smith
?
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Raj Gupta
?
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
wayn
?

Recently uploaded (20)

SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
DianaGray10
?
Solana Developer Hiring for Enterprises Key Considerations.pdf
Solana Developer Hiring for Enterprises Key Considerations.pdfSolana Developer Hiring for Enterprises Key Considerations.pdf
Solana Developer Hiring for Enterprises Key Considerations.pdf
Lisa ward
?
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptxHHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HampshireHUG
?
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥ÈDragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
CRI Japan, Inc.
?
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing EnvironmentsAutomated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Pablo G¨®mez Abajo
?
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AIGDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
James Anderson
?
Commit Conf 2025 Bitnami Charts with Kubescape
Commit Conf 2025 Bitnami Charts with KubescapeCommit Conf 2025 Bitnami Charts with Kubescape
Commit Conf 2025 Bitnami Charts with Kubescape
Alfredo Garc¨ªa Lavilla
?
Convert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating systemConvert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating system
Rachel Walker
?
The Road to SAP S4HANA Cloud with SAP Activate.pptx
The Road to SAP S4HANA Cloud with SAP Activate.pptxThe Road to SAP S4HANA Cloud with SAP Activate.pptx
The Road to SAP S4HANA Cloud with SAP Activate.pptx
zsbaranyai
?
How Telemedicine App Development is Revolutionizing Virtual Care.pptx
How Telemedicine App Development is Revolutionizing Virtual Care.pptxHow Telemedicine App Development is Revolutionizing Virtual Care.pptx
How Telemedicine App Development is Revolutionizing Virtual Care.pptx
Dash Technologies Inc
?
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly MeetupLeadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
GDG Kathmandu
?
San Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdfSan Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdf
Matt Doar
?
Recruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OGRecruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OG
Matt Charney
?
Why Outsource Accounting to India A Smart Business Move!.pdf
Why Outsource Accounting to India A Smart Business Move!.pdfWhy Outsource Accounting to India A Smart Business Move!.pdf
Why Outsource Accounting to India A Smart Business Move!.pdf
anjelinajones6811
?
Artificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examplesArtificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examples
anandsimple
?
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable WorkflowsMastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Callon Campbell
?
Network_Packet_Brokers_Presentation.pptx
Network_Packet_Brokers_Presentation.pptxNetwork_Packet_Brokers_Presentation.pptx
Network_Packet_Brokers_Presentation.pptx
Khushi Communications
?
Research Data Management (RDM): the management of dat in the research process
Research Data Management (RDM): the management of dat in the research processResearch Data Management (RDM): the management of dat in the research process
Research Data Management (RDM): the management of dat in the research process
HeilaPienaar
?
AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025
AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025
AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025
David Brossard
?
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-WorldAll-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
Safe Software
?
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
DianaGray10
?
Solana Developer Hiring for Enterprises Key Considerations.pdf
Solana Developer Hiring for Enterprises Key Considerations.pdfSolana Developer Hiring for Enterprises Key Considerations.pdf
Solana Developer Hiring for Enterprises Key Considerations.pdf
Lisa ward
?
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptxHHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HHUG-04-2025-Close-more-deals-from-your-existing-pipeline-FOR SLIDESHARE.pptx
HampshireHUG
?
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥ÈDragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN  NB-IoT  LTE cat.M1ÉÌÆ·¥ê¥¹¥È
Dragino¥×¥í¥À¥¯¥È¥«¥¿¥í¥° LoRaWAN NB-IoT LTE cat.M1ÉÌÆ·¥ê¥¹¥È
CRI Japan, Inc.
?
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing EnvironmentsAutomated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Pablo G¨®mez Abajo
?
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AIGDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
GDG Cloud Southlake #41: Shay Levi: Beyond the Hype:How Enterprises Are Using AI
James Anderson
?
Convert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating systemConvert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating system
Rachel Walker
?
The Road to SAP S4HANA Cloud with SAP Activate.pptx
The Road to SAP S4HANA Cloud with SAP Activate.pptxThe Road to SAP S4HANA Cloud with SAP Activate.pptx
The Road to SAP S4HANA Cloud with SAP Activate.pptx
zsbaranyai
?
How Telemedicine App Development is Revolutionizing Virtual Care.pptx
How Telemedicine App Development is Revolutionizing Virtual Care.pptxHow Telemedicine App Development is Revolutionizing Virtual Care.pptx
How Telemedicine App Development is Revolutionizing Virtual Care.pptx
Dash Technologies Inc
?
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly MeetupLeadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
GDG Kathmandu
?
San Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdfSan Francisco Atlassian ACE - Mar 27 2025.pdf
San Francisco Atlassian ACE - Mar 27 2025.pdf
Matt Doar
?
Recruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OGRecruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OG
Matt Charney
?
Why Outsource Accounting to India A Smart Business Move!.pdf
Why Outsource Accounting to India A Smart Business Move!.pdfWhy Outsource Accounting to India A Smart Business Move!.pdf
Why Outsource Accounting to India A Smart Business Move!.pdf
anjelinajones6811
?
Artificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examplesArtificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examples
anandsimple
?
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable WorkflowsMastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Callon Campbell
?
Research Data Management (RDM): the management of dat in the research process
Research Data Management (RDM): the management of dat in the research processResearch Data Management (RDM): the management of dat in the research process
Research Data Management (RDM): the management of dat in the research process
HeilaPienaar
?
AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025
AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025
AuthZEN The OpenID Connect of Authorization - Gartner IAM EMEA 2025
David Brossard
?
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-WorldAll-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
All-Data, Any-AI Integration: FME & Amazon Bedrock in the Real-World
Safe Software
?

Chapter 3: Introduction to Regular Expression

  • 1. Chapter 3 Introduction to Regular Expressions 1 Dr. Hadeel Alazzam Scripting Programming 1
  • 2. 2 ? Regular Expression ? Commands in use ? grep and egrep ? Regular Expression Metacharacters ? Grouping ? Brackets and Character Classes ? Back References ? Quantifiers ? Anchors and Word Boundaries ? Practical Examples Outline
  • 3. 3 RegularExpression ? Regular expressions (regex) are a powerful method for describing a text pattern to be matched by various tools. ? There is only one place in bash where regular expressions are valid, using the =~ comparison in the [[ compound command, as in an if statement. ? Regular expressions are a crucial part of the larger toolkit for commands like grep, awk, and sed in particular.
  • 4. 4 RegularExpressionvs. PatternMatching ? Pattern matching is used by the shell commands such as the ls command. ? Regular expressions are used to search for strings of text in a file by using commands, such as the grep command. ? The use of regular expressions is generally associated with text processing.
  • 5. 5 CommandsinUse ? grep: The grep command searches the content of the files for a given pattern and prints any line where the pattern is matched. ? To use grep, you need to provide it with a pattern and one or more filenames (or piped data). ? Common command options: ? -c: Count the number of lines that match the pattern. ? -E: Enable extended regular expressions. ? -f: Read the search pattern from a provided file. A file can contain more than one pattern, with each line containing a single pattern. ? -i: Ignore character case. ? -l: Print only the filename and path where the pattern was found. ? -n: Print the line number of the file where the pattern was found. ? -p: Enable the Perl regular expression engine. ? -R, -r: Recursively search subdirectories.
  • 6. 6 CommandsinUse ? In general, grep is used like this: ? grep options pattern filenames ? To search the /home directory and all subdirectories for files containing the word password, regardless of uppercase/lowercase distinctions:
  • 7. 7 grepandegrep ? The grep command supports some variations, notably extended syntax for the regex patterns ? There are three ways to tell grep that you want special meaning on certain characters: 1. by preceding those characters with a backslash. 2. by telling grep that you want the special syntax (without the need for a backslash) by using the -E option when you invoke grep. 3. by using the command named egrep, which is a script that simply invokes grep as grep ¨CE so you don¡¯t have to. ? The only characters that are affected by the extended syntax are? + { | ( and ).
  • 8. Regular Expression Metacharacters ? Regular expressions are patterns that are created using a series of characters and metacharacters. ? Metacharacters such as the questions mark (?) and asterisk (*) have special meaning beyond their literal meanings in regex. ? The 7 lines of frost.txt file will be used in the next slides examples.
  • 9. Regular Expression Metacharacters ? 1 Two roads diverged in a yellow wood, ? 2 And sorry I could not travel both ? 3 And be one traveler, long I stood ? 4 And looked down one as far as I could ? 5 To where it bent in the undergrowth; ? 6 ? 7 Excerpt from The Road Not Taken by Robert Frost
  • 10. 10 Regular Expression Metacharacters ? The ¡°.¡± Metacharacter: ? The period (.) represents a single wildcard character. ? It will match on any single character except for a newline. ? If you want to treat this metacharacter as a period character rather than a wildcard, precede it with a backslash (.) to escape its special meaning. ? If we try to match on the pattern T.o, the first line of the frost.txt file is returned because it contains the word Two ? Regex patterns are also case sensitive, which is why line 3 of the file is not returned even though it contains the string too
  • 11. 11 Regular Expression Metacharacters ? The ¡°?¡± Metacharacter: ? The question mark (?) character makes any item that precedes it optional. ? It matches it zero or one time. ? This pattern will match on any three-character sequence that begins with T and ends with o as well as the two-character sequence To. ? Note that we are using egrep here. ? We could have used grep ¨CE, ? or we could have used ¡°plain¡± grep with a slightly different pattern: T.?o, putting the backslash on the question mark to give it the extended meaning.
  • 12. 12 Regular Expression Metacharacters ? The ¡°*¡± Metacharacter: ? The asterisk (*) is a special character that matches the preceding item zero or more times. ? It is similar to ?, the main difference being that the previous item may appear more than once. ? The .* in the preceding pattern allows any number of any character to appear between the T and o. ? Thus, the last line also matches because it contains the pattern The Ro.
  • 13. 13 Regular Expression Metacharacters ? The ¡°+¡± Metacharacter: ? The plus sign (+) metacharacter is the same as the * except it requires the preceding item to appear at least once. ? The preceding pattern specifies one or more of any character to appear in between the T and o. ? The first line of text matches because of Two ¡ª the w is one character between the T and the o. ? The second line doesn¡¯t match the To, as in the previous example; rather, the pattern matches a much larger string ¡ª all the way to the o in undergrowth. ? The last line also matches because it contains the pattern The Ro.
  • 14. 14 Grouping ? We can use parentheses to group characters. ? Among other things, this allows us to treat the characters appearing inside the parentheses as a single item that we can later reference. ? Here, we use parentheses and the Boolean OR operator (|) to create a pattern that will match on line 3. ? Line 3 as written has the word traveler in it, but this pattern would match even if traveler was replaced by the word stranger.
  • 15. 15 Brackets and Character Classes ? The square brackets, [ ] , are used to define character classes and lists of acceptable characters. ? Using this construct, you can list exactly which characters are matched at this position in the pattern. ? This is particularly useful when trying to perform user-input validation. ? As shorthand, you can specify ranges with a dash, such as [a-j]. ? These ranges are in your locale¡¯s collating sequence and alphabet. ? The pattern [a-j] will match one of the letters a through j.
  • 16. 16 Brackets and Character Classes ? Table 3-1 provides a list of common examples when using character classes and ranges. ? Be careful when defining a range for digits; the range can at most go from 0 to 9. For example, the pattern [1-475] does not match on numbers between 1 and 475; it matches on any one of the digits (characters) in the range 1¨C4 or the character 7 or the character 5.
  • 17. 17 Brackets and Character Classes ? There are also predefined character classes known as shortcuts. ? These can be used to indicate common character classes such as numbers or letters.
  • 18. 18 Brackets and Character Classes ? Note that the shortcuts are not supported by egrep. ? In order to use them, you must use grep with the -p option. ? That option enables the Perl regular expression engine to support the shortcuts. ? Note: -p (small letter)
  • 19. 19 Brackets and Character Classes ? Other character classes (are valid only within the bracket syntax, as shown in Table 3-3. ? They match a single character, so if you need to match many in a row, use the * or + to get the repetition you need. ? To use one of these classes, it has to be inside the brackets, so you end up with two sets of brackets. ? This will match any line with an X followed by any uppercase letter or digit. It would match these lines:
  • 21. 21 Back References ? Regex back references are one of the most powerful and often confusing regex operations. ? Consider the following file, tags.txt: ? Suppose you want to write a regular expression that will extract any line that contains a matching pair of complete HTML tags. ? The start tag has an HTML tag name; the ending tag has the same tag name but with a leading slash. <div> and </div> are a matching pair. You can search for these by writing a lengthy regex that contains all possible HTML tag values, or you can focus on the format of an ? HTML tag and use a regex back reference, as follows:
  • 22. 22 Back References ? In this example, the back reference is the 1 appearing in the latter part of the regular expression. ? It is referring back to the expression enclosed in the first set of parentheses, [A-Za-z]*, which has two parts. ? The letter range in brackets denotes a choice of any letter, uppercase or lowercase. ? The * that follows it means to repeat that zero or more times. ? Therefore, the 1 refers to whatever was matched by that pattern in parentheses. ? If [A-Za-z]* matches div, then the 1 also refers to the pattern div.
  • 23. 23 Back References ? You can have more than one back reference in an expression and refer to each with a 1 or 2 or 3 depending on its order in the regular expression ? A 1 refers to the first set of parentheses, 2 to the second, and so on ? Note that the parentheses are metacharacters; they have a special meaning. ? If you just want to match a literal parenthesis, you need to escape its special meaning by preceding it with a backslash, as in sin([0-9.]*) to match expressions like sin(6.2) or sin(3.14159).
  • 24. 24 Quantifiers ? Quantifiers specify the number of times an item must appear in a string. ? Quantifiers are defined by curly braces { }. ? For example, the pattern T{5} means that the letter T must appear consecutively exactly five times. ? The pattern T{3,6} means that the letter T must appear consecutively three to six times. ? The pattern T{5,} means that the letter T must appear five or more times.
  • 25. 25 Anchors and Word Boundaries ? You can use anchors to specify that a pattern must exist at the beginning or the end of a string. ? The caret (^) character is used to anchor a pattern to the beginning of a string. ? For example, ^[1-5] means that a matching string must start with one of the digits 1 through 5, as the first character on the line. ? The $ character is used to anchor a pattern to the end of a string or line. ? For example, [1-5]$ means that a string must end with one of the digits 1 through 5. ? In addition, you can use b to identify a word boundary (i.e., a space). ? The pattern b[1-5]b will match on any of the digits 1 through 5, where the digit appears as its own word.
  • 27. End 27 Dr. Aryaf Al-adwan, Autonomous Systems Dept 27

Editor's Notes

  • #4: Awk? Aho, Weinberger and Kernighan The awk command is a Linux tool and programming language that allows users to process and manipulate data and produce formatted reports SED is a text stream editor used on Unix systems to edit files quickly and efficiently. The tool searches through, replaces, adds, and deletes lines in a text file without opening the file in a text editor.
  • #18: \w ? [A-Za-z0-9_] \s matches a space, a tab, a carriage return, a line feed, or a form feed. [?\t\r\n\f]. \f? page separator \D is the same as [^\d]