際際滷

際際滷Share a Scribd company logo
Regex in +60
By: Ghulam Imaduddin
ghulam@ideweb.co.id
Before We Start
 Tools:
 Notepad++ (Windows) - https://notepad-plus-plus.org/
 Sublime (Mac) - https://www.sublimetext.com/
 Online tools - https://regex101.com/
 Sample dataset
Common Type
 Character class: [abc]
 Character: . s w d
 Quantifiers: ? + * {1}
 Anchors: ^ $
 Group/Capture: () (a|b)
Character class
 [abc]: single character of: a, b, or c
 [^abc]: character except: a, b, or c
 [a-z]: character in the range of a-z
 [a-zA-Z0-9]: character in the range of a-z or A-Z or 0-9
Character
 . : any single character
 s: any whitespace (space, tab); S: any non-whitespace
 d: any digit (equal to [0-9])
 w: any word (equal to [a-zA-Z0-9]
 t: tab character
 r: carriage return
 n: new line
Quantifiers
 a?: Zero or one of a
 a*: zero or more of a
 a+: one or more of a
 a{3}: exactly 3 of a
 a{3,}: 3 or more of a
 a{3,6}: between 3 and 6 of a
Anchors
 ^: Start of string
 $: end of string
 b: any word boundary
 B: any non-word boundary
Group Capture
 (): capture everything enclosed
 (a|b): match either a or b
Hands-on
One line to multiline
Input:
Expected Output:
Steps:
1. Change ,  to enter
2. Parse name and email
3. Copy to excel
1
2
Cleansing & Reformat
Input:
1. Step 1
 find: ^(d{1,2})-(d{1,2})-(d{4})$
 Replace with: 3-2-1
2. Step 2:
 find: -(d{1})$
 Replace with: -01
3. Step 3: remove non-date line ;)
Output: How:
Q & A

More Related Content

Regex intro

  • 1. Regex in +60 By: Ghulam Imaduddin ghulam@ideweb.co.id
  • 2. Before We Start Tools: Notepad++ (Windows) - https://notepad-plus-plus.org/ Sublime (Mac) - https://www.sublimetext.com/ Online tools - https://regex101.com/ Sample dataset
  • 3. Common Type Character class: [abc] Character: . s w d Quantifiers: ? + * {1} Anchors: ^ $ Group/Capture: () (a|b)
  • 4. Character class [abc]: single character of: a, b, or c [^abc]: character except: a, b, or c [a-z]: character in the range of a-z [a-zA-Z0-9]: character in the range of a-z or A-Z or 0-9
  • 5. Character . : any single character s: any whitespace (space, tab); S: any non-whitespace d: any digit (equal to [0-9]) w: any word (equal to [a-zA-Z0-9] t: tab character r: carriage return n: new line
  • 6. Quantifiers a?: Zero or one of a a*: zero or more of a a+: one or more of a a{3}: exactly 3 of a a{3,}: 3 or more of a a{3,6}: between 3 and 6 of a
  • 7. Anchors ^: Start of string $: end of string b: any word boundary B: any non-word boundary
  • 8. Group Capture (): capture everything enclosed (a|b): match either a or b
  • 10. One line to multiline Input: Expected Output: Steps: 1. Change , to enter 2. Parse name and email 3. Copy to excel 1 2
  • 11. Cleansing & Reformat Input: 1. Step 1 find: ^(d{1,2})-(d{1,2})-(d{4})$ Replace with: 3-2-1 2. Step 2: find: -(d{1})$ Replace with: -01 3. Step 3: remove non-date line ;) Output: How:
  • 12. Q & A