INDEX
Explanations
the word "one" and its variations in different contexts
New Auto-Interp
Negative Logits
ä¸ĺ
-0.16
Occurred
-0.15
rogram
-0.15
rente
-0.15
киÑĪ
-0.14
agli
-0.14
inois
-0.14
laz
-0.14
igel
-0.13
lant
-0.13
POSITIVE LOGITS
else
0.34
ever
0.28
except
0.25
except
0.24
else
0.23
else
0.23
EVER
0.23
ever
0.21
ELSE
0.20
-ever
0.20
Activations Density 0.034%