INDEX
Explanations
references to online platforms like wikis and Wikipedia
mentions of "Wiki" or "Wikipedia"
New Auto-Interp
Negative Logits
Samson
-0.77
luster
-0.74
ringing
-0.71
Opportunity
-0.67
lled
-0.67
bath
-0.67
complementary
-0.64
acted
-0.64
period
-0.64
Ortiz
-0.63
POSITIVE LOGITS
Wiki
3.81
Wiki
3.21
wiki
3.04
wik
2.80
wiki
2.59
Wik
2.08
wik
1.83
Wik
1.80
pedia
1.78
edia
1.60
Activations Density 0.029%