INDEX
Explanations
phrases that denote purpose or reason within the text
New Auto-Interp
Negative Logits
utar
-0.17
-arm
-0.15
iterr
-0.14
morph
-0.14
ande
-0.14
еж
-0.14
lland
-0.14
modo
-0.14
562
-0.13
uito
-0.13
POSITIVE LOGITS
ilst
0.16
ernel
0.15
.synthetic
0.15
rea
0.15
904
0.14
ANO
0.14
antt
0.14
OLS
0.14
aves
0.14
اÙĨÙĪ
0.13
Activations Density 0.242%