INDEX
Explanations
the word "due", and also names
New Auto-Interp
Negative Logits
-1.33
-1.20
↵
-1.02
(
-0.96
-0.91
_
-0.90
I
-0.90
↵↵
-0.88
:
-0.85
<eos>
-0.84
POSITIVE LOGITS
Majefty
1.97
myſelf
1.95
Efq
1.82
purpoſe
1.80
itſelf
1.75
Jefus
1.72
pleaſure
1.70
himſelf
1.66
ſelf
1.63
themſelves
1.62
Activations Density 0.880%