Regex
- 1. Preface
Regular Expression (蠏) 曙広 REGEX
string pattern 覓語伎 譟壱 蠏豺
meta charater るジ 覩碁ゼ 覓語
grep 蠏 螳 碁Μ一.
egrep, fgrep grep 麹 覯.
sed ろ碁 一.
awk 伎 る0 語危伎.
- 2. what is String Pattern?
譟壱 覓語伎 蠏豺
e-mail 譯殊
譴螳 @ 覓語螳 煙
@ 覓語 るジ讓曙 dot 覓, ろれ襦 企伎
@ 覓語 殊曙 螻覈
Web URL
http:// 朱
語ろ語企れ URI 螳 覿螻 襴蟲譟磯 覈覈
CGI 蠍磯 蟆曙一 ? 煙ロ
- 3. Regular Expression : Examples
a.cdef?
[a-zA-Z]+
.*boy
(caret|dalar)
(.*/)[^/]*
^Do.*?$
http://([a-zA-Z0-9.-])/.*
http://.*?(.*)
REGEX襯 覦一 れ 伎企!
- 4. POSIX regex: meta char.
覓語 讌 . 覓語 螳襯 覩誤.
覦覲 讌
? 覓語伎 0螳 轟 1螳 . - ERE
+ 覓語伎 1螳 伎 覦覲給. - ERE
* 覓語伎 0螳 伎 覦覲給.
{...}
(interval) 覦覲旧襯 讌 讌 給. 襯 れ
{3} : 3覯 覦覲 {,7} : 7覯 危 {2,5} : 2~5覯 覦覲
豺讌
^ 殊語 覿覿 覩誤.
$ 殊語 覿覿 覩誤.
蠏碁9 讌
[...] 讌 覓語 蠏碁9 譴 覓語襯 讌.
[^...] 讌 蠏碁9 覓語襯 誤 襾語(讌)襯 讌.
蠍壱
(escape) 覃 覩碁ゼ 譴.
| (alternation) OR一一 . - ERE
( ) 蟯碁 伎 蠏碁9 覦 覦焔朱一れ .
* POSIX RE - IEEE std 1003.1 (International standard)
* ERE - Extended Regular Expression
- 5. applying pattern
dot/period : . - any single character
c.b : cab, cbb, ccb, cdb, c1b, c2b 焔
a..b : axyb, a12b, ax0b, a#-b 焔
a.........b : 企 覦朱 一 .
- 6. applying pattern (con't)
?, +, *, {m,n} - iteration, interval
X?ML : XML or ML
can+ : can, cann, cannn, cannnn, ...
can* : ca, can, cann, cannn, ...
http.* : http://, httpd, https, http1234
"http"れ 企 覓語 覿
abc{2,5} : abcc, abccc, abcccc, abccccc
interval expression 覈覈 , RE matching engine 讌 .
- 8. applying pattern (con't)
[ ], [^ ] - character class
[abcd] : a, b, c, d
[0-9] : 0, 1, 2, ... , 9
[a-zA-Z0-9] : 覯滑骸
[^0-9] : [0-9] 誤 襾語
^豌企ゼ 蠏碁9 る?
^ [ 覦襦 れ襷 れ 朱 .
轟 escape り碓...
interval expression 覈覈 伎 旧 l伎朱 讌.
e.g. awk
- 9. greedy matching
greedy matching 企?
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything
</i> I feel"
$ echo $var2 | egrep -o "<.+>"
<b>real</b>It's gonna <i>change everything</i>
pattern 豕 襷 襷れ広 り
greedy matching result set 覯襯 譴螳覃伎
燕襦...
non-greedy matching 企?
greedy matching 蟆郁骸襯 豕 襷れ広 蟆郁骸.
- 10. non-greedy matching (con't)
non-greedy matching
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything
</i> I feel"
$ echo $var2 | egrep -o "<.+>"
<b>real</b>It's gonna <i>change everything</i>
$ echo $var2 | egrep -o "<[^<>]+>"
<b>
</b>
<i>
</i>
- 11. back-reference
襷れ広 蟆郁骸襯 れ (覦焔朱一)
"( )"襦 覓苦 襷れ広 覿覿 "#" 襦
(# 螳 襦), 0覯 豌 襷れ広 蟆郁骸
$ egrep "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd
sunyzero:x:500:500:Steven Kim:/home/sunyzero:/bin/bash
linuxer:x:502:502::/home/linuxer:/bin/bash
$ egrep -v "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd
... (, 蠍 覦) ...
-v : invert
--color : Surround the matched (non-empty) strings
- 12. back-reference (cont)
back-reference : tag襦 螳語讌 覿覿 豢豢
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything
</i> I feel"
$ echo $var2 | egrep -o "<([a-zA-Z0-9]+)>.*</1>"
<b>real</b>
<i>change everything</i>
$ echo $var2 | egrep --color "<([a-zA-Z0-9]+)>.*</1>"
... ...
- 13. Tip! - sed (stream ed)
substitution (sed)
vim substitution command 螳
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything
</i> I feel"
$ echo $var2 | sed -e "s/<[^<>]+>/ /g"
It's gonna be real It's gonna change everything I feel
$ echo $var2 | sed -e "s,<[^<>]+>, ,g"
vim substitution command sed 蠍磯レ 蟆訖企!
= sed襯 覃 vim 螻... UNIX 企蟆 襦 郁 蠍磯ルれ 襷.
- 14. Tip! - awk
awk 覈 蠍磯レ 蟲 .
$ var2="Its gonna be <b>real</b>Its gonna <i>change everything
</i> I feel"
$ echo $var2 | awk '{ gsub(/[ ]*<[^<>]+>[ ]*/, " "); print }'
Its gonna be real Its gonna change everything I feel
- 15. alternation
( ) alternation 襦
"( )" alternation 企 pattern group 覓苦 .
$ echo "cat is not dog" | egrep -o "(cat|dog)"
cat
dog
$ echo "My Childhood~~~ bye bye" | egrep -o "(child|boy)?hood"
hood
- 16. predefined character class
企 覈
[[:alnum:]] 覯滑骸 れ 覈
[[:alpha:]] 覯葛 (覓語)
[[:blank:]] Tab(t) 覩
[[:cntrl:]] 企語れ 覩
[[:digit:]] れ 覩
[[:xdigit:]] 16讌(hex) れ 覩, 讀 0-9a-fA-F 襯 .
[[:upper:]] 覯 覓語
[[:lower:]] 覯 覓語
[[:space:]] tab(t), CR(r), New line(n) .
[[:print:]] 豢 螳ロ 覓語
[[:graph:]] 螻給葦 誤 覓語
[[:punct:]] 豢 螳ロ 轟覓語
- 17. predefined character class (con't)
[...] 譟壱螳
$ var5="sunyzero@email.com:010-8500-80**:Sun-young Kim:AB-0105R"
$ echo $var5 | egrep -o "^[[:alpha:]@]+"
sunyzero@email
$ echo $var5 | egrep -o "[[:upper:][:digit:]-]{8}"
010-8500
AB-0105R
sunyzero@email蟾讌襷 碁. 覈 り る?
- 18. boundary - ERE
word 蟆所 蟆
b boundary螳 襷 襷 谿場給. ( 蟆所覃 蟆)
B boundary 襷讌 襷 谿場給. ( 蟆所覃伎 蟆曙磯 蟆)
$ var3="abc? <def> 123hijklm"
$ echo $var3 | egrep -o "[a-j]+"
$ echo $var3 | egrep --color "B[a-j]+B"
abc? <def> 123hijklm
abc
def
hij
$ echo $var3 | egrep --color "b[a-j]+b"
abc? <def> 123hijklm
- 19. REGEX and PCRE
POSIX REGEX
螳 襷れ広 .
伎 覲旧″ 企覃 焔レ螳 覦.
豌 蠎 POSIX REGEX覿 牛伎朱 .- Standard蟾!
PCRE (Perl Compatible Regular Expr.)
perl ル 蠏
襷れ 觜襯 , ル ...
C, C++, 蠍壱 覿覿 語願 讌. (豢螳 殊企襴襦 螻)
る企朱 PCRE襯 ク .