INDEX
Explanations
statements that assert or emphasize the existence or importance of a subject
New Auto-Interp
Negative Logits
zin
-0.08
acid
-0.07
esi
-0.06
ead
-0.06
hiba
-0.06
dig
-0.06
etest
-0.06
amon
-0.06
zik
-0.06
_fwd
-0.06
POSITIVE LOGITS
toward
0.09
towards
0.09
tw
0.07
ırak
0.07
onus
0.06
igli
0.06
.tw
0.06
оÑĢов
0.06
ä¹İ
0.06
how
0.06
Activations Density 0.008%