INDEX
Explanations
proper nouns and named entities in various contexts
New Auto-Interp
Negative Logits
M
-0.58
T
-0.57
F
-0.56
F
-0.53
D
-0.52
D
-0.50
p
-0.50
K
-0.50
E
-0.50
L
-0.49
POSITIVE LOGITS
ſelf
1.22
Autoritní
1.22
transQ
1.19
itſelf
1.13
neceſſ
1.11
Signalez
1.10
ſever
1.09
raiſ
1.07
himſelf
1.06
myſelf
1.06
Activations Density 0.991%