際際滷

際際滷Share a Scribd company logo
Preface 
 Regular Expression (蠏) 曙広 REGEX 
 string pattern 覓語伎 譟壱 蠏豺 
 meta charater るジ 覩碁ゼ  覓語 
 grep 蠏 螳   碁Μ一. 
 egrep, fgrep grep 麹 覯. 
 sed ろ碁 一. 
 awk 伎 る0   語危伎.
what is String Pattern? 
 譟壱 覓語伎 蠏豺 
 e-mail 譯殊 
 譴螳 @ 覓語螳 煙 
 @ 覓語 るジ讓曙 dot  覓, ろれ襦 企伎 
 @ 覓語 殊曙 螻覈 
 Web URL 
 http:// 朱  
 語ろ語企れ URI 螳 覿螻 襴蟲譟磯 覈覈 
 CGI 蠍磯  蟆曙一 ?  煙ロ
Regular Expression : Examples 
a.cdef? 
[a-zA-Z]+ 
.*boy 
(caret|dalar) 
(.*/)[^/]* 
^Do.*?$ 
http://([a-zA-Z0-9.-])/.* 
http://.*?(.*) 
REGEX襯 覦一 れ 伎企!
POSIX regex: meta char. 
覓語 讌 .  覓語  螳襯 覩誤. 
覦覲 讌 
? 覓語伎 0螳 轟 1螳 . - ERE 
+ 覓語伎 1螳 伎 覦覲給. - ERE 
* 覓語伎 0螳 伎 覦覲給. 
{...} 
(interval) 覦覲旧襯 讌 讌  給. 襯 れ 
{3} : 3覯 覦覲 {,7} : 7覯 危 {2,5} : 2~5覯 覦覲 
豺讌 
^ 殊語 覿覿 覩誤. 
$ 殊語 覿覿 覩誤. 
蠏碁9 讌 
[...]  讌 覓語 蠏碁9 譴  覓語襯 讌. 
[^...]  讌 蠏碁9 覓語襯 誤 襾語(讌)襯 讌. 
蠍壱 
 (escape) 覃 覩碁ゼ 譴. 
| (alternation) OR一一 . - ERE 
( ) 蟯碁 伎 蠏碁9 覦 覦焔朱一れ  . 
* POSIX RE - IEEE std 1003.1 (International standard) 
* ERE - Extended Regular Expression
applying pattern 
 dot/period : . - any single character 
 c.b : cab, cbb, ccb, cdb, c1b, c2b 焔 
 a..b : axyb, a12b, ax0b, a#-b 焔 
 a.........b : 企 覦朱 一 .
applying pattern (con't) 
 ?, +, *, {m,n} - iteration, interval 
 X?ML : XML or ML 
 can+ : can, cann, cannn, cannnn, ... 
 can* : ca, can, cann, cannn, ... 
 http.* : http://, httpd, https, http1234 
 "http"れ 企 覓語 覿   
 abc{2,5} : abcc, abccc, abcccc, abccccc 
 interval expression 覈覈 , RE matching engine 讌 .
applying pattern (con't) 
 ^, $ - position 
 ^ftp : "ftp"襦   
 ^$ : 觜伎  ( 螻  覓企 覓語 ) 
 <BR>$ : <BR>襦  蟆曙
applying pattern (con't) 
 [ ], [^ ] - character class 
 [abcd] : a, b, c, d 
 [0-9] : 0, 1, 2, ... , 9 
 [a-zA-Z0-9] : 覯滑骸  
 [^0-9] : [0-9] 誤 襾語 
 ^豌企ゼ 蠏碁9 る? 
 ^ [ 覦襦 れ襷 れ 朱 . 
 轟 escape り碓... 
 interval expression 覈覈 伎 旧 l伎朱 讌. 
 e.g. awk
greedy matching 
 greedy matching 企? 
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything 
</i> I feel" 
$ echo $var2 | egrep -o "<.+>" 
<b>real</b>It's gonna <i>change everything</i> 
 pattern  豕 襷  襷れ広 り  
 greedy matching result set 覯襯 譴螳覃伎   
 燕襦... 
 non-greedy matching 企? 
 greedy matching 蟆郁骸襯 豕 襷れ広   蟆郁骸.
non-greedy matching (con't) 
 non-greedy matching    
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything 
</i> I feel" 
$ echo $var2 | egrep -o "<.+>" 
<b>real</b>It's gonna <i>change everything</i> 
$ echo $var2 | egrep -o "<[^<>]+>" 
<b> 
</b> 
<i> 
</i>
back-reference 
 襷れ広 蟆郁骸襯 れ   (覦焔朱一) 
 "( )"襦 覓苦  襷れ広 覿覿 "#" 襦  
(# 螳 襦), 0覯 豌 襷れ広 蟆郁骸 
$ egrep "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd 
sunyzero:x:500:500:Steven Kim:/home/sunyzero:/bin/bash 
linuxer:x:502:502::/home/linuxer:/bin/bash 
$ egrep -v "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd 
... (, 蠍 覦) ... 
 -v : invert 
 --color : Surround the matched (non-empty) strings
back-reference (cont) 
 back-reference  : tag襦 螳語讌 覿覿 豢豢 
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything 
</i> I feel" 
$ echo $var2 | egrep -o "<([a-zA-Z0-9]+)>.*</1>" 
<b>real</b> 
<i>change everything</i> 
$ echo $var2 | egrep --color "<([a-zA-Z0-9]+)>.*</1>" 
...  ...
Tip! - sed (stream ed) 
 substitution (sed) 
 vim substitution command 螳 
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything 
</i> I feel" 
$ echo $var2 | sed -e "s/<[^<>]+>/ /g" 
It's gonna be real It's gonna change everything I feel 
$ echo $var2 | sed -e "s,<[^<>]+>, ,g" 
 vim substitution command sed 蠍磯レ  蟆訖企! 
= sed襯 覃 vim 螻... UNIX 企蟆 襦 郁 蠍磯ルれ 襷.
Tip! - awk 
 awk  覈 蠍磯レ 蟲  . 
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything 
</i> I feel" 
$ echo $var2 | awk '{ gsub(/[ ]*<[^<>]+>[ ]*/, " "); print }' 
Its gonna be real Its gonna change everything I feel
alternation 
 ( ) alternation 襦  
 "( )" alternation 企 pattern group 覓苦 . 
$ echo "cat is not dog" | egrep -o "(cat|dog)" 
cat 
dog 
$ echo "My Childhood~~~ bye bye" | egrep -o "(child|boy)?hood" 
hood
predefined character class 
企  覈 
[[:alnum:]] 覯滑骸 れ 覈 
[[:alpha:]] 覯葛 (覓語) 
[[:blank:]] Tab(t) 覩 
[[:cntrl:]] 企語れ 覩 
[[:digit:]] れ 覩 
[[:xdigit:]] 16讌(hex) れ 覩, 讀 0-9a-fA-F 襯 . 
[[:upper:]] 覯 覓語 
[[:lower:]] 覯 覓語 
[[:space:]] tab(t), CR(r), New line(n)  . 
[[:print:]] 豢 螳ロ 覓語 
[[:graph:]] 螻給葦 誤 覓語 
[[:punct:]] 豢 螳ロ 轟覓語
predefined character class (con't) 
 [...] 譟壱螳 
$ var5="sunyzero@email.com:010-8500-80**:Sun-young Kim:AB-0105R" 
$ echo $var5 | egrep -o "^[[:alpha:]@]+" 
sunyzero@email 
$ echo $var5 | egrep -o "[[:upper:][:digit:]-]{8}" 
010-8500 
AB-0105R 
 sunyzero@email蟾讌襷 碁. 覈 り る?
boundary - ERE 
 word 蟆所 蟆  
b boundary螳 襷 襷 谿場給. ( 蟆所覃 蟆) 
B boundary 襷讌  襷 谿場給. ( 蟆所覃伎  蟆曙磯 蟆) 
$ var3="abc? <def> 123hijklm" 
$ echo $var3 | egrep -o "[a-j]+" 
$ echo $var3 | egrep --color "B[a-j]+B" 
abc? <def> 123hijklm 
abc 
def 
hij 
$ echo $var3 | egrep --color "b[a-j]+b" 
abc? <def> 123hijklm
REGEX and PCRE 
 POSIX REGEX 
 螳  襷れ広 . 
 伎 覲旧″ 企覃 焔レ螳 覦. 
 豌 蠎 POSIX REGEX覿 牛伎朱 .- Standard蟾! 
 PCRE (Perl Compatible Regular Expr.) 
 perl  ル 蠏 
 襷れ 觜襯 , ル ... 
 C, C++, 蠍壱 覿覿 語願 讌. (豢螳 殊企襴襦 螻) 
 る企朱 PCRE襯  ク .

More Related Content

Regex

  • 1. Preface Regular Expression (蠏) 曙広 REGEX string pattern 覓語伎 譟壱 蠏豺 meta charater るジ 覩碁ゼ 覓語 grep 蠏 螳 碁Μ一. egrep, fgrep grep 麹 覯. sed ろ碁 一. awk 伎 る0 語危伎.
  • 2. what is String Pattern? 譟壱 覓語伎 蠏豺 e-mail 譯殊 譴螳 @ 覓語螳 煙 @ 覓語 るジ讓曙 dot 覓, ろれ襦 企伎 @ 覓語 殊曙 螻覈 Web URL http:// 朱 語ろ語企れ URI 螳 覿螻 襴蟲譟磯 覈覈 CGI 蠍磯 蟆曙一 ? 煙ロ
  • 3. Regular Expression : Examples a.cdef? [a-zA-Z]+ .*boy (caret|dalar) (.*/)[^/]* ^Do.*?$ http://([a-zA-Z0-9.-])/.* http://.*?(.*) REGEX襯 覦一 れ 伎企!
  • 4. POSIX regex: meta char. 覓語 讌 . 覓語 螳襯 覩誤. 覦覲 讌 ? 覓語伎 0螳 轟 1螳 . - ERE + 覓語伎 1螳 伎 覦覲給. - ERE * 覓語伎 0螳 伎 覦覲給. {...} (interval) 覦覲旧襯 讌 讌 給. 襯 れ {3} : 3覯 覦覲 {,7} : 7覯 危 {2,5} : 2~5覯 覦覲 豺讌 ^ 殊語 覿覿 覩誤. $ 殊語 覿覿 覩誤. 蠏碁9 讌 [...] 讌 覓語 蠏碁9 譴 覓語襯 讌. [^...] 讌 蠏碁9 覓語襯 誤 襾語(讌)襯 讌. 蠍壱 (escape) 覃 覩碁ゼ 譴. | (alternation) OR一一 . - ERE ( ) 蟯碁 伎 蠏碁9 覦 覦焔朱一れ . * POSIX RE - IEEE std 1003.1 (International standard) * ERE - Extended Regular Expression
  • 5. applying pattern dot/period : . - any single character c.b : cab, cbb, ccb, cdb, c1b, c2b 焔 a..b : axyb, a12b, ax0b, a#-b 焔 a.........b : 企 覦朱 一 .
  • 6. applying pattern (con't) ?, +, *, {m,n} - iteration, interval X?ML : XML or ML can+ : can, cann, cannn, cannnn, ... can* : ca, can, cann, cannn, ... http.* : http://, httpd, https, http1234 "http"れ 企 覓語 覿 abc{2,5} : abcc, abccc, abcccc, abccccc interval expression 覈覈 , RE matching engine 讌 .
  • 7. applying pattern (con't) ^, $ - position ^ftp : "ftp"襦 ^$ : 觜伎 ( 螻 覓企 覓語 ) <BR>$ : <BR>襦 蟆曙
  • 8. applying pattern (con't) [ ], [^ ] - character class [abcd] : a, b, c, d [0-9] : 0, 1, 2, ... , 9 [a-zA-Z0-9] : 覯滑骸 [^0-9] : [0-9] 誤 襾語 ^豌企ゼ 蠏碁9 る? ^ [ 覦襦 れ襷 れ 朱 . 轟 escape り碓... interval expression 覈覈 伎 旧 l伎朱 讌. e.g. awk
  • 9. greedy matching greedy matching 企? $ var2="Its gonna be <b>real</b>Its gonna <i>change everything </i> I feel" $ echo $var2 | egrep -o "<.+>" <b>real</b>It's gonna <i>change everything</i> pattern 豕 襷 襷れ広 り greedy matching result set 覯襯 譴螳覃伎 燕襦... non-greedy matching 企? greedy matching 蟆郁骸襯 豕 襷れ広 蟆郁骸.
  • 10. non-greedy matching (con't) non-greedy matching $ var2="Its gonna be <b>real</b>Its gonna <i>change everything </i> I feel" $ echo $var2 | egrep -o "<.+>" <b>real</b>It's gonna <i>change everything</i> $ echo $var2 | egrep -o "<[^<>]+>" <b> </b> <i> </i>
  • 11. back-reference 襷れ広 蟆郁骸襯 れ (覦焔朱一) "( )"襦 覓苦 襷れ広 覿覿 "#" 襦 (# 螳 襦), 0覯 豌 襷れ広 蟆郁骸 $ egrep "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd sunyzero:x:500:500:Steven Kim:/home/sunyzero:/bin/bash linuxer:x:502:502::/home/linuxer:/bin/bash $ egrep -v "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd ... (, 蠍 覦) ... -v : invert --color : Surround the matched (non-empty) strings
  • 12. back-reference (cont) back-reference : tag襦 螳語讌 覿覿 豢豢 $ var2="Its gonna be <b>real</b>Its gonna <i>change everything </i> I feel" $ echo $var2 | egrep -o "<([a-zA-Z0-9]+)>.*</1>" <b>real</b> <i>change everything</i> $ echo $var2 | egrep --color "<([a-zA-Z0-9]+)>.*</1>" ... ...
  • 13. Tip! - sed (stream ed) substitution (sed) vim substitution command 螳 $ var2="Its gonna be <b>real</b>Its gonna <i>change everything </i> I feel" $ echo $var2 | sed -e "s/<[^<>]+>/ /g" It's gonna be real It's gonna change everything I feel $ echo $var2 | sed -e "s,<[^<>]+>, ,g" vim substitution command sed 蠍磯レ 蟆訖企! = sed襯 覃 vim 螻... UNIX 企蟆 襦 郁 蠍磯ルれ 襷.
  • 14. Tip! - awk awk 覈 蠍磯レ 蟲 . $ var2="Its gonna be <b>real</b>Its gonna <i>change everything </i> I feel" $ echo $var2 | awk '{ gsub(/[ ]*<[^<>]+>[ ]*/, " "); print }' Its gonna be real Its gonna change everything I feel
  • 15. alternation ( ) alternation 襦 "( )" alternation 企 pattern group 覓苦 . $ echo "cat is not dog" | egrep -o "(cat|dog)" cat dog $ echo "My Childhood~~~ bye bye" | egrep -o "(child|boy)?hood" hood
  • 16. predefined character class 企 覈 [[:alnum:]] 覯滑骸 れ 覈 [[:alpha:]] 覯葛 (覓語) [[:blank:]] Tab(t) 覩 [[:cntrl:]] 企語れ 覩 [[:digit:]] れ 覩 [[:xdigit:]] 16讌(hex) れ 覩, 讀 0-9a-fA-F 襯 . [[:upper:]] 覯 覓語 [[:lower:]] 覯 覓語 [[:space:]] tab(t), CR(r), New line(n) . [[:print:]] 豢 螳ロ 覓語 [[:graph:]] 螻給葦 誤 覓語 [[:punct:]] 豢 螳ロ 轟覓語
  • 17. predefined character class (con't) [...] 譟壱螳 $ var5="sunyzero@email.com:010-8500-80**:Sun-young Kim:AB-0105R" $ echo $var5 | egrep -o "^[[:alpha:]@]+" sunyzero@email $ echo $var5 | egrep -o "[[:upper:][:digit:]-]{8}" 010-8500 AB-0105R sunyzero@email蟾讌襷 碁. 覈 り る?
  • 18. boundary - ERE word 蟆所 蟆 b boundary螳 襷 襷 谿場給. ( 蟆所覃 蟆) B boundary 襷讌 襷 谿場給. ( 蟆所覃伎 蟆曙磯 蟆) $ var3="abc? <def> 123hijklm" $ echo $var3 | egrep -o "[a-j]+" $ echo $var3 | egrep --color "B[a-j]+B" abc? <def> 123hijklm abc def hij $ echo $var3 | egrep --color "b[a-j]+b" abc? <def> 123hijklm
  • 19. REGEX and PCRE POSIX REGEX 螳 襷れ広 . 伎 覲旧″ 企覃 焔レ螳 覦. 豌 蠎 POSIX REGEX覿 牛伎朱 .- Standard蟾! PCRE (Perl Compatible Regular Expr.) perl ル 蠏 襷れ 觜襯 , ル ... C, C++, 蠍壱 覿覿 語願 讌. (豢螳 殊企襴襦 螻) る企朱 PCRE襯 ク .