INDEX
Explanations
various expressions of attitudes, particularly negative and hostile ones
New Auto-Interp
Negative Logits
orman
-0.17
ekim
-0.16
Callable
-0.15
ijo
-0.14
PN
-0.14
onom
-0.14
à¸ĩาà¸Ļ
-0.14
_prim
-0.14
lico
-0.13
Pant
-0.13
POSITIVE LOGITS
towards
0.81
toward
0.77
Towards
0.65
Towards
0.59
hacia
0.54
Tow
0.54
owards
0.43
åIJij
0.41
oward
0.40
verso
0.40
Activations Density 0.203%