INDEX
Explanations
terms associated with adoption
New Auto-Interp
Negative Logits
.
-0.72
<eos>
-0.63
*
-0.60
(
-0.60
;
-0.59
,
-0.59
here
-0.59
-
-0.58
I
-0.57
↵↵
-0.57
POSITIVE LOGITS
Efq
1.44
myſelf
1.43
Jefus
1.33
itſelf
1.32
purpoſe
1.31
Monfieur
1.28
^(@)
1.26
―――――
1.26
Majefty
1.23
ſelf
1.21
Activations Density 0.140%