INDEX
Explanations
phrases that introduce hypothetical scenarios or assumptions
New Auto-Interp
Negative Logits
-1.33
(
-1.10
.
-1.08
"
-1.07
<eos>
-1.06
The
-1.02
↵
-1.02
A
-1.01
↵↵
-1.00
B
-0.98
POSITIVE LOGITS
myſelf
2.00
Efq
1.98
itſelf
1.92
purpoſe
1.84
pleaſure
1.83
houſe
1.81
becauſe
1.81
Monfieur
1.80
ſtate
1.80
Jefus
1.79
Activations Density 0.232%