INDEX
Explanations
references to the word "who" in various contexts
New Auto-Interp
Negative Logits
mente
-0.23
ted
-0.19
ting
-0.16
нен
-0.15
ning
-0.15
ned
-0.15
gado
-0.15
IEL
-0.15
uen
-0.14
erais
-0.14
POSITIVE LOGITS
else
0.24
_else
0.16
oping
0.16
soever
0.16
osh
0.16
TestFixture
0.15
apk
0.14
oles
0.14
else
0.14
beer
0.14
Activations Density 0.033%