INDEX
Explanations
mentions of the year 2008
references to the year 2008
New Auto-Interp
Negative Logits
ancest
-0.79
afort
-0.66
Magikarp
-0.66
hma
-0.64
oÄŁ
-0.64
unpre
-0.63
kef
-0.61
unin
-0.61
lying
-0.60
subst
-0.60
POSITIVE LOGITS
å¹
0.88
-'
0.83
ilton
0.72
é¾
0.69
Reloaded
0.68
wark
0.67
aeda
0.66
Reconstruction
0.66
clinton
0.65
onwards
0.65
Activations Density 0.024%