INDEX
Explanations
phrases indicating fundamental implications or explanations
New Auto-Interp
Negative Logits
col
-0.69
en
-0.68
po
-0.63
in
-0.62
che
-0.59
a
-0.59
y
-0.59
(
-0.59
Col
-0.58
de
-0.58
POSITIVE LOGITS
Efq
1.56
Monfieur
1.56
Reſ
1.40
myſelf
1.38
purpoſe
1.38
itſelf
1.35
ſelf
1.35
Houſe
1.34
―――――
1.34
ſtate
1.30
Activations Density 0.223%