ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
/15	
A	New	Seman-c	Similarity	Based	Measure	
for	Assessing	Research	Contribu-on	
Petr	Knoth	&	Drahomira	Herrmannova	
Knowledge	Media	ins-tute,	The	Open	University	
1
/15	
Current	impact	metrics	
	
	
	
	
	
	
	
	
	
•  Pros:	simplicity,	availability	for	evalua-on	purposes	
•  Cons:	insufficient	evidence	of	quality	and	research	
contribu-on	
2
/15	
Problems	of	current	impact	metrics	
•  Sen-ment,	seman-cs,	context	and	mo-ves	[Nicolaisen,	2007]	
•  Popularity	and	size	of	research	communi-es	[Brumback,	
2009;	Seglen,	1997]	
•  Time	delay	[Priem	and	Hemminger,	2010]	
•  Skewness	of	the	distribu-on	[Seglen,	1992]	
•  Differences	between	types	of	research	papers	[Seglen,	1997]	
•  Ability	to	game/manipulate	cita-ons	[Arnold	and	Fowler,	
2010;	Editors,	2006]	
	
3
/15	
Alterna-ve	metrics	
•  Alt-/Webo-metrics	etc.	
–  Impact	s-ll	dependent	on	the	number	of	interac-ons	in	a	
scholarly	communica-on	network	
•  Full-text	(Semantometrics)	
–  Contribu-on	to	the	discipline	dependent	on	the	content	of	
the	manuscript.	
4
/15	
Approach	
Premise:	Full-text	needed	to	assess	publica-on’s	research	
contribu-on.	
Hypothesis:	Added	value	of	publica-on	p	can	be	es-mated	
based	on	the	seman-c	distance	from	the	publica-ons	cited	by	p	
to	publica-ons	ci-ng	p.	
	
	
	
	
5
/15	
Approach	
Premise:	Full-text	needed	to	assess	publica-on’s	research	
contribu-on.	
Hypothesis:	Added	value	of	publica-on	p	can	be	es-mated	
based	on	the	seman-c	distance	from	the	publica-ons	cited	by	p	
to	publica-ons	ci-ng	p.	
	
	
	
	
5
/15	
Approach	
Premise:	Full-text	needed	to	assess	publica-on’s	research	
contribu-on.	
Hypothesis:	Added	value	of	publica-on	p	can	be	es-mated	
based	on	the	seman-c	distance	from	the	publica-ons	cited	by	p	
to	publica-ons	ci-ng	p.	
	
	
	
	
5
/15	
Contribu-on	measure	
6
/15	
Contribu-on	measure	
p	
6
/15	
Contribu-on	measure	
p	
6
/15	
Contribu-on	measure	
p	
6
/15	
Contribu-on	measure	
p	
A	
6
/15	
Contribu-on	measure	
p	
A	 B	
6
/15	
Contribu-on	measure	
p	
A	 B	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
6
/15	
Contribu-on	measure	
p	
A	 B	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
⎧
⎨
⎪
⎩
⎪
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
⎧
⎨
⎪
⎩
⎪
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
â‹…
1
| B |â‹…| A |
â‹… dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
â‹… dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Datasets	
•  Requirements	
– Availability	of	full-text	
– Density	
– Mul-disciplinarity	
– (Availability	of	cita-ons)	
7
/15	
Datasets	
Full-text	 Density	 Mul5disciplinarity	
CORE	 ✓	 ✗	 ✓	
Open	Cita-on	Corpus	 ✓	 -	 ✗	
ACM	Dataset	 ✗	 -	 ✓	
DBLP+Cita-on	 ✗	 -	 ✓	
iSearch	Collec-on	 ✓	 ✗	 ✗	
8
/15	
Our	dataset	
•  10	seed	publica-ons	from	CORE	with	varying	
level	of	cita-ons	
•  missing	ci-ng	and	cited	publica-ons	
downloaded	manually	
•  only	freely	accessible	English	documents	were	
downloaded	
•  in	total	716	documents	(~50%	of	the	complete	
network)	
•  2	days	to	gather	the	data	
9
/15	
Results	
Publica5on	no.	 |B|	(Cita5on	score)	 |A|	(No.	of	references)	 Contribu5on	
1	 5	(9)	 6	(8)	 0.4160	
2	 7	(11)	 52	(93)	 0.3576	
3	 12	(20)	 15	(31)	 0.4874	
4	 14	(27)	 27	(72)		 0.4026	
5	 16	(30)		 12	(21)		 0.5117	
6	 25	(41)		 8	(13)	 0.4123	
7	 39	(71)		 70	(128)	 0.4309	
8	 53	(131)	 3	(10)		 0.5197	
9	 131	(258)	 22	(32)	 0.5058	
10	 172	(360)	 17	(20)	 0.5004	
474	(958)	 232	(428)	
10
/15	
Results	
11
/15	
Current	impact	metrics	vs	Semantometrics	
Unaffected	by	 Current	impact	metrics	 Semantometrics	
Cita-on	sen-ment,	seman-cs,	context,	
mo-ves	
✗	 ✔	
Popularity	&	size	of	res.	communi-es	 ✗	 ✔	
Time	delay	 ✗	 ✗/✔*	
Skewness	of	the	cita-on	distribu-on	 ✗	 ✔	
Differences	between	types	of	res.	papers	 ✗	 ✔	
Ability	to	game/manipulate	the	metrics	 ✗	 ✗/✔**	
*	reduced	to	1	cita-on		
**	assuming	that	self-cita-ons	are	not	taken	into	account	
12
/15	
Conclusions	
•  Full-text	necessary	
•  Semantometrics	are	a	new	class	of	methods		
•  We	showed	one	method	to	assess	the	
research	contribu-on	
13
/15	
References	
•  Jeppe	Nicolaisen.	2007.	Cita-on	Analysis.	Annual	Review	of	
Informa-on	Science	and	Technology,	41(1):609-641.	
•  Douglas	N	Arnold	and	Kris-ne	K	Fowler.	2010.	Nefarious	
numbers.	No-ces	of	the	American	Mathema-cal	Society,	
58(3):434-437.	
•  Roger	A	Brumback.	2009.	Impact	factor	wars:	Episode	V	--	The	
Empire	Strikes	Back.	Journal	of	child	neurology,	24(3):260-2,	
March.	
•  The	PLoS	Medicine	Editors.	2006.	The	impact	factor	game.	
PLoS	medicine,	3(6),	June.	
14
/15	
References	
•  Jason	Priem	and	Bradely	M.	Hemminger.	2010.	Scientometrics	
2.0:	Toward	new	metrics	of	scholarly	impact	on	the	social	
Web.	First	Monday,	15(7),	July.	
•  Per	Omar	Seglen.	1992.	The	Skewness	of	Science.	Journal	of	
the	American	Society	for	Informa-on	Science,	43(9):628-638,	
October.	
•  Per	Omar	Seglen.	1997.	Why	the	impact	factor	of	journals	
should	not	be	used	for	evalua-ng	research.	BMJ:	Bri-sh	
Medical	Journal,	314(February):498-502.	
15

More Related Content

Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing a Research Publication's Contribution