狠狠撸

狠狠撸Share a Scribd company logo
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
What is a regular expression?
“A string that defines a text
matching pattern”
Jill roll number is 1001
Bob roll number is 1002
Rob roll number is 1003
Jack roll number is 1004 Extract Roll Numbers ?
dddd
Regular Expression
1001
1002
1003
1004
For video lectures, check out
www.facebook.com/CSxFunda
What is the advantage of using
regular expressions?
? Using regular expressions, You can extract
text which follows a pattern by writing only
very few lines of codes
For video lectures, check out
www.facebook.com/CSxFunda
Example
A weight is 46kg
B weight is 54kg
C weight is 60kg
D weight is 70kg
Text File
46
54
60
70
Extract
Without Using Regular Expressions
? Lengthy
Code
? Complex
For video lectures, check out
www.facebook.com/CSxFunda
Example
A weight is 46kg
B weight is 54kg
C weight is 60kg
D weight is 70kg
Text File
46
54
60
70
Extract
Using Regular Expressions
? Less Code
? Simple
For video lectures, check out
www.facebook.com/CSxFunda
Python String
?Set of characters enclosed in single or
double quotes
Ex: ‘Kalyan’, “Meghana”
For video lectures, check out
www.facebook.com/CSxFunda
Python Raw Strings
?Set of characters enclosed in single or
double quotes preceded by r
Ex: r‘Kalyan’, r“Meghana”
For video lectures, check out
www.facebook.com/CSxFunda
Strings vs Raw Strings
?You can write a regular expression as a
string or raw string
?In a string regular expression, you have
to escape the special characters.
?In a raw string regular expression, you
need not to escape the special
characters
For video lectures, check out
www.facebook.com/CSxFunda
Regular Expressions
? Regular Expression are supported by many programming languages
Ex: Perl, Ruby , Java, Python, Javascript ……………
? Some languages provide regex capabilities built in
Ex: Perl
? Some languages provide regex capabilities via libraries
Ex: Python
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
re Module
?Python supports regular expressions through re
module
?That is, you have to import re module for
using regular expressions
import re
?No need to explicitly install this module
www.facebook.com/CSxFunda
Steps
Import re module
Write regular expression
Create regex object
Call the function using regex object
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
re Module Functions
? match(text)
? search(text)
? findall(text)
? finditer(text)
? sub(replString, text)
? split(text)
For video lectures, check out
www.facebook.com/CSxFunda
match()
o Looks for the match at the beginning of the string
o Returns a match object if there is a match , otherwise
returns None
regex=re.compile(pattern)
mo=regex.match(text)
For video lectures, check out
www.facebook.com/CSxFunda
search()
o Looks for the match any where in the string
o Returns a match object if there is a match , otherwise
returns None
o If string has more than one match, returns the match
object for the first match only
regex=re.compile(pattern)
mo=regex.search(text)
For video lectures, check out
www.facebook.com/CSxFunda
findall()
o Looks for the match any where in the string
o Returns all matched substrings as a list if there is
match, otherwise returns empty list
regex=re.compile(pattern)
values=regex.findall(text)
For video lectures, check out
www.facebook.com/CSxFunda
finditer()
o Looks for the match any where in the string
o Returns objects for all matched substrings as a list
if there is a match, otherwise returns empty list
regex=re.compile(pattern)
moList=regex.finditer(text)
For video lectures, check out
www.facebook.com/CSxFunda
sub()
o replaces all the matched substrings with the
given replString and returns the modified
string, if there is match
o Returns original string, if there is no match
o Similar to replace option in text editors
regex=re.compile(pattern)
regex.sub(replString,text)
For video lectures, check out
www.facebook.com/CSxFunda
split()
o Looks for match anywhere in the string
o Splits the string at the matched substrings and
returns the splitted string as a list
o Returns original string, if there is no match
o Similar to split() method in strings
regex=re.compile(pattern)
regex.split(text)
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
Groups
o You want to match a substring in a string and want to
extract a part of matched substring, grouping is used.
Match the roll number CS1004 and extract the last four digits
For video lectures, check out
www.facebook.com/CSxFunda
Groups - Types
oNumbered Groups
oNamed Groups
oNon-capturing Groups
For video lectures, check out
www.facebook.com/CSxFunda
Numbered Groups
For video lectures, check out
www.facebook.com/CSxFunda
Named Groups
? When groups are
large in number,
it is difficult
to remember the
group numbers
? In such a case,
we use named
groups
NonCapturing Group(?:)
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
Meta Characters
| (pipe)
? (question mark)
* (asterisk)
+ (plus symbol)
. (dot symbol)
For video lectures, check out
www.facebook.com/CSxFunda
|(pipe)
Matches one of the many characters
Matches
42
100
30
111
A weight is 42kg
B weight is 100kg
C weight is 30kg
D weight is 111kg
For video lectures, check out
www.facebook.com/CSxFunda
?(question mark)
Matches zero or one occurrence
Matches
42
100
30
111
A weight is 42kg
B weight is 100kg
C weight is 30kg
D weight is 111kg
For video lectures, check out
www.facebook.com/CSxFunda
*(asterisk)
Matches zero or more occurrence
Matches
abbbc
abc
ac
abbbc
abc
ac
For video lectures, check out
www.facebook.com/CSxFunda
+(plus symbol)
Matches one or more occurrence
Matches
abbbc
abbc
abc
abbbc
abbc
abc
For video lectures, check out
www.facebook.com/CSxFunda
.(dot symbol)
Matches any character except ‘n’
Matches
Kalyan007
Kalyann007
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
pattern{m}
Matches exactly m repetitions
Matches exactly 3 digits
For video lectures, check out
www.facebook.com/CSxFunda
pattern{m,n}
Matches minimum of m repetitions
& maximum of n repetitions
For video lectures, check out
www.facebook.com/CSxFunda
pattern{m,}
Matches a minimum of m repetitions
Matches exactly 3 digits
Matches exactly 5 digits
Matches exactly 4 digits
Matches exactly 6 digits
.
.
. For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
Greedy Matching
Looks for the maximum possible match
abcabcabcabc Greedy Match
abcabcabcabc
For video lectures, check out
www.facebook.com/CSxFunda
NonGreedy Matching(?)
Looks for the minimum possible match
abcabcabcabc NonGreedy Match
abc
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
Character Classes
Matches one of the many characters
Types
Positive Character Class
Negative Character Class
Shorthand Character Class
For video lectures, check out
www.facebook.com/CSxFunda
Positive Character Class
Matches one of the characters specified in []
[abc] Matches a or b or c
[aeiou] Matches a ,e,i,o,u
[0123456789] Matches numbers 0 to 9
[a-c0-9] Matches a,b,c or 0 to 9
For video lectures, check out
www.facebook.com/CSxFunda
Negative Character Class
Matches any character other than the characters
specified in [^]
[^aeiou] Matches other an aeiou
b1001
c1002
d1003
f1004
h1005
b1001
c1002
d1003
f1004
h1005
For video lectures, check out
www.facebook.com/CSxFunda
Shorthand Character Class
d any decimal digit [0-9]Matches Equivalent to
D Any non- digit [^0-9]Matches Equivalent to
w any word character [a-zA-z_0-9]Matches Equivalent to
W
any non word char
[^a-zA-Z_0-9]Matches Equivalent to
s any space character [ntrfv]Matches Equivalent to
S Any non space char [^ntvrf]Matches Equivalent to
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
Anchoring
Specify the relative location of the match
Anchoring Meaning
^ Start of line or string
$ End of line or string
A Start of string
Z End of string
b Word boundary
B Non word boundary
For video lectures, check out
www.facebook.com/CSxFunda
^ - Start of line or String
Specify the location of the match as “start of
line or string”
r’^Hello’
r’hello’
Matches hello anywhere in
the input string
Matches hello at the
beginning of input string
For video lectures, check out
www.facebook.com/CSxFunda
$ - End of line or String
Specify the location of the match as “end of
line or string”
r’bye$’
r’bye’
Matches bye anywhere in the
input string
Matches bye at the end of
input string
For video lectures, check out
www.facebook.com/CSxFunda
b – Word boundary
Specify the word boundary
Matches any character other than word characters
that is, other than [a-zA-z0-9_]
bcatb
Matches cat in
“My cat”
“Your cat”
“cat1 is good”
“(cat) is pet”
But not in
“Concatenation of strings”
“catalyst is zinc”
For video lectures, check out
www.facebook.com/CSxFunda
B – Non word boundary
Specify the non word boundary
Opposite to b
BcatB
Matches cat in
“Concatenation of strings”
“Acatalyst is zinc”
But not in
“My cat”
“Your cat”
“cat1 is good”
“(cat) is pet”
For video lectures, check out
www.facebook.com/CSxFunda
For video lectures, check out
www.facebook.com/CSxFunda
Compilation Flags
? compile() has two paramters, first one is the
pattern and second one is compilation flag
which is optional.
re.compile(pattern, [flag])
? Compilation flags can be passed to compile() or
it can be embedded in the regex pattern itself
Pattern=r’(?i)w+’
For video lectures, check out
www.facebook.com/CSxFunda
Compilation Flags
Compilation Flag Meaning
re.IGNORECASE OR re.I Case insensitive matching
(i)
re.DOTALL or re.S . Matches any character
including ‘n’ (s)
re.VERBOSE or re.X Ignores white spaces and
comments (x)
re.MULTILINE or re.M Enable multi line mode (m)
re.UNICODE or re.U Enable the Unicode mode (u)
For video lectures, check out
www.facebook.com/CSxFunda
Assertions
Look Ahead Assertions
Positive Look Ahead Assertions
Negative Look Ahead Assertions
Look Behind Assertions
Positive Look Behind Assertions
Negative Look Behind Assertions

More Related Content

Python Regular Expressions

  • 1. For video lectures, check out www.facebook.com/CSxFunda
  • 2. For video lectures, check out www.facebook.com/CSxFunda
  • 3. What is a regular expression? “A string that defines a text matching pattern” Jill roll number is 1001 Bob roll number is 1002 Rob roll number is 1003 Jack roll number is 1004 Extract Roll Numbers ? dddd Regular Expression 1001 1002 1003 1004 For video lectures, check out www.facebook.com/CSxFunda
  • 4. What is the advantage of using regular expressions? ? Using regular expressions, You can extract text which follows a pattern by writing only very few lines of codes For video lectures, check out www.facebook.com/CSxFunda
  • 5. Example A weight is 46kg B weight is 54kg C weight is 60kg D weight is 70kg Text File 46 54 60 70 Extract Without Using Regular Expressions ? Lengthy Code ? Complex For video lectures, check out www.facebook.com/CSxFunda
  • 6. Example A weight is 46kg B weight is 54kg C weight is 60kg D weight is 70kg Text File 46 54 60 70 Extract Using Regular Expressions ? Less Code ? Simple For video lectures, check out www.facebook.com/CSxFunda
  • 7. Python String ?Set of characters enclosed in single or double quotes Ex: ‘Kalyan’, “Meghana” For video lectures, check out www.facebook.com/CSxFunda
  • 8. Python Raw Strings ?Set of characters enclosed in single or double quotes preceded by r Ex: r‘Kalyan’, r“Meghana” For video lectures, check out www.facebook.com/CSxFunda
  • 9. Strings vs Raw Strings ?You can write a regular expression as a string or raw string ?In a string regular expression, you have to escape the special characters. ?In a raw string regular expression, you need not to escape the special characters For video lectures, check out www.facebook.com/CSxFunda
  • 10. Regular Expressions ? Regular Expression are supported by many programming languages Ex: Perl, Ruby , Java, Python, Javascript …………… ? Some languages provide regex capabilities built in Ex: Perl ? Some languages provide regex capabilities via libraries Ex: Python For video lectures, check out www.facebook.com/CSxFunda
  • 11. For video lectures, check out www.facebook.com/CSxFunda
  • 12. re Module ?Python supports regular expressions through re module ?That is, you have to import re module for using regular expressions import re ?No need to explicitly install this module www.facebook.com/CSxFunda
  • 13. Steps Import re module Write regular expression Create regex object Call the function using regex object For video lectures, check out www.facebook.com/CSxFunda
  • 14. For video lectures, check out www.facebook.com/CSxFunda
  • 15. re Module Functions ? match(text) ? search(text) ? findall(text) ? finditer(text) ? sub(replString, text) ? split(text) For video lectures, check out www.facebook.com/CSxFunda
  • 16. match() o Looks for the match at the beginning of the string o Returns a match object if there is a match , otherwise returns None regex=re.compile(pattern) mo=regex.match(text) For video lectures, check out www.facebook.com/CSxFunda
  • 17. search() o Looks for the match any where in the string o Returns a match object if there is a match , otherwise returns None o If string has more than one match, returns the match object for the first match only regex=re.compile(pattern) mo=regex.search(text) For video lectures, check out www.facebook.com/CSxFunda
  • 18. findall() o Looks for the match any where in the string o Returns all matched substrings as a list if there is match, otherwise returns empty list regex=re.compile(pattern) values=regex.findall(text) For video lectures, check out www.facebook.com/CSxFunda
  • 19. finditer() o Looks for the match any where in the string o Returns objects for all matched substrings as a list if there is a match, otherwise returns empty list regex=re.compile(pattern) moList=regex.finditer(text) For video lectures, check out www.facebook.com/CSxFunda
  • 20. sub() o replaces all the matched substrings with the given replString and returns the modified string, if there is match o Returns original string, if there is no match o Similar to replace option in text editors regex=re.compile(pattern) regex.sub(replString,text) For video lectures, check out www.facebook.com/CSxFunda
  • 21. split() o Looks for match anywhere in the string o Splits the string at the matched substrings and returns the splitted string as a list o Returns original string, if there is no match o Similar to split() method in strings regex=re.compile(pattern) regex.split(text) For video lectures, check out www.facebook.com/CSxFunda
  • 22. For video lectures, check out www.facebook.com/CSxFunda
  • 23. Groups o You want to match a substring in a string and want to extract a part of matched substring, grouping is used. Match the roll number CS1004 and extract the last four digits For video lectures, check out www.facebook.com/CSxFunda
  • 24. Groups - Types oNumbered Groups oNamed Groups oNon-capturing Groups For video lectures, check out www.facebook.com/CSxFunda
  • 25. Numbered Groups For video lectures, check out www.facebook.com/CSxFunda
  • 26. Named Groups ? When groups are large in number, it is difficult to remember the group numbers ? In such a case, we use named groups
  • 27. NonCapturing Group(?:) For video lectures, check out www.facebook.com/CSxFunda
  • 28. For video lectures, check out www.facebook.com/CSxFunda
  • 29. Meta Characters | (pipe) ? (question mark) * (asterisk) + (plus symbol) . (dot symbol) For video lectures, check out www.facebook.com/CSxFunda
  • 30. |(pipe) Matches one of the many characters Matches 42 100 30 111 A weight is 42kg B weight is 100kg C weight is 30kg D weight is 111kg For video lectures, check out www.facebook.com/CSxFunda
  • 31. ?(question mark) Matches zero or one occurrence Matches 42 100 30 111 A weight is 42kg B weight is 100kg C weight is 30kg D weight is 111kg For video lectures, check out www.facebook.com/CSxFunda
  • 32. *(asterisk) Matches zero or more occurrence Matches abbbc abc ac abbbc abc ac For video lectures, check out www.facebook.com/CSxFunda
  • 33. +(plus symbol) Matches one or more occurrence Matches abbbc abbc abc abbbc abbc abc For video lectures, check out www.facebook.com/CSxFunda
  • 34. .(dot symbol) Matches any character except ‘n’ Matches Kalyan007 Kalyann007 For video lectures, check out www.facebook.com/CSxFunda
  • 35. For video lectures, check out www.facebook.com/CSxFunda
  • 36. pattern{m} Matches exactly m repetitions Matches exactly 3 digits For video lectures, check out www.facebook.com/CSxFunda
  • 37. pattern{m,n} Matches minimum of m repetitions & maximum of n repetitions For video lectures, check out www.facebook.com/CSxFunda
  • 38. pattern{m,} Matches a minimum of m repetitions Matches exactly 3 digits Matches exactly 5 digits Matches exactly 4 digits Matches exactly 6 digits . . . For video lectures, check out www.facebook.com/CSxFunda
  • 39. For video lectures, check out www.facebook.com/CSxFunda
  • 40. Greedy Matching Looks for the maximum possible match abcabcabcabc Greedy Match abcabcabcabc For video lectures, check out www.facebook.com/CSxFunda
  • 41. NonGreedy Matching(?) Looks for the minimum possible match abcabcabcabc NonGreedy Match abc For video lectures, check out www.facebook.com/CSxFunda
  • 42. For video lectures, check out www.facebook.com/CSxFunda
  • 43. Character Classes Matches one of the many characters Types Positive Character Class Negative Character Class Shorthand Character Class For video lectures, check out www.facebook.com/CSxFunda
  • 44. Positive Character Class Matches one of the characters specified in [] [abc] Matches a or b or c [aeiou] Matches a ,e,i,o,u [0123456789] Matches numbers 0 to 9 [a-c0-9] Matches a,b,c or 0 to 9 For video lectures, check out www.facebook.com/CSxFunda
  • 45. Negative Character Class Matches any character other than the characters specified in [^] [^aeiou] Matches other an aeiou b1001 c1002 d1003 f1004 h1005 b1001 c1002 d1003 f1004 h1005 For video lectures, check out www.facebook.com/CSxFunda
  • 46. Shorthand Character Class d any decimal digit [0-9]Matches Equivalent to D Any non- digit [^0-9]Matches Equivalent to w any word character [a-zA-z_0-9]Matches Equivalent to W any non word char [^a-zA-Z_0-9]Matches Equivalent to s any space character [ntrfv]Matches Equivalent to S Any non space char [^ntvrf]Matches Equivalent to For video lectures, check out www.facebook.com/CSxFunda
  • 47. For video lectures, check out www.facebook.com/CSxFunda
  • 48. Anchoring Specify the relative location of the match Anchoring Meaning ^ Start of line or string $ End of line or string A Start of string Z End of string b Word boundary B Non word boundary For video lectures, check out www.facebook.com/CSxFunda
  • 49. ^ - Start of line or String Specify the location of the match as “start of line or string” r’^Hello’ r’hello’ Matches hello anywhere in the input string Matches hello at the beginning of input string For video lectures, check out www.facebook.com/CSxFunda
  • 50. $ - End of line or String Specify the location of the match as “end of line or string” r’bye$’ r’bye’ Matches bye anywhere in the input string Matches bye at the end of input string For video lectures, check out www.facebook.com/CSxFunda
  • 51. b – Word boundary Specify the word boundary Matches any character other than word characters that is, other than [a-zA-z0-9_] bcatb Matches cat in “My cat” “Your cat” “cat1 is good” “(cat) is pet” But not in “Concatenation of strings” “catalyst is zinc” For video lectures, check out www.facebook.com/CSxFunda
  • 52. B – Non word boundary Specify the non word boundary Opposite to b BcatB Matches cat in “Concatenation of strings” “Acatalyst is zinc” But not in “My cat” “Your cat” “cat1 is good” “(cat) is pet” For video lectures, check out www.facebook.com/CSxFunda
  • 53. For video lectures, check out www.facebook.com/CSxFunda
  • 54. Compilation Flags ? compile() has two paramters, first one is the pattern and second one is compilation flag which is optional. re.compile(pattern, [flag]) ? Compilation flags can be passed to compile() or it can be embedded in the regex pattern itself Pattern=r’(?i)w+’ For video lectures, check out www.facebook.com/CSxFunda
  • 55. Compilation Flags Compilation Flag Meaning re.IGNORECASE OR re.I Case insensitive matching (i) re.DOTALL or re.S . Matches any character including ‘n’ (s) re.VERBOSE or re.X Ignores white spaces and comments (x) re.MULTILINE or re.M Enable multi line mode (m) re.UNICODE or re.U Enable the Unicode mode (u) For video lectures, check out www.facebook.com/CSxFunda
  • 56. Assertions Look Ahead Assertions Positive Look Ahead Assertions Negative Look Ahead Assertions Look Behind Assertions Positive Look Behind Assertions Negative Look Behind Assertions