INDEX
Explanations
phrases expressing obligation or necessity
New Auto-Interp
Negative Logits
olem
-0.16
mada
-0.15
oras
-0.15
elage
-0.15
successfully
-0.14
uhe
-0.14
окол
-0.14
eyim
-0.14
phia
-0.14
gree
-0.14
POSITIVE LOGITS
ashamed
0.20
avoided
0.20
kept
0.18
given
0.16
approached
0.16
careful
0.16
examined
0.15
warning
0.15
viewed
0.15
carefully
0.15
Activations Density 0.116%