INDEX
Explanations
instances of personal pronouns and their variations
New Auto-Interp
Negative Logits
Administrativna
-0.89
ロウィン
-0.77
Италијани
-0.76
CURIAM
-0.75
niſſe
-0.74
lenker
-0.74
zwiſchen
-0.72
<unused68>
-0.72
<unused41>
-0.72
<unused8>
-0.72
POSITIVE LOGITS
-
0.46
The
0.46
_
0.38
0.35
In
0.35
0
0.33
(
0.32
For
0.31
There
0.31
will
0.30
Activations Density 1.344%