INDEX
Explanations
mentions of the word "Wikipedia"
references to Wikipedia and its content
New Auto-Interp
Negative Logits
rone
-0.76
pter
-0.73
Bethlehem
-0.70
charism
-0.70
moon
-0.66
quer
-0.66
awks
-0.65
âĹ¼
-0.65
taboola
-0.64
Elys
-0.64
POSITIVE LOGITS
ipedia
1.40
encyclopedia
1.00
Wikipedia
0.97
ileaks
0.96
Commons
0.95
Wikipedia
0.92
wiki
0.89
Leaks
0.88
pedia
0.86
Encyclopedia
0.83
Activations Density 0.012%