INDEX
Explanations
specific definite articles and personal pronouns that indicate identity or involvement
New Auto-Interp
Negative Logits
visor
-0.15
orks
-0.15
isseur
-0.14
ocaust
-0.14
ê³
-0.14
vation
-0.14
Crus
-0.14
erken
-0.14
eyse
-0.14
concent
-0.13
POSITIVE LOGITS
collateral
0.15
364
0.15
ford
0.15
gal
0.14
215
0.14
164
0.14
λά
0.14
311
0.14
itan
0.14
bef
0.14
Activations Density 0.002%