INDEX
Explanations
occurrences of the word "hit" or similar words relating to impact
New Auto-Interp
Negative Logits
'
-1.03
"
-1.00
(
-0.97
-0.96
↵↵
-0.96
<eos>
-0.93
-0.91
G
-0.87
C
-0.86
a
-0.83
POSITIVE LOGITS
Efq
1.76
―――――
1.59
Theſe
1.55
Majefty
1.54
Monfieur
1.48
itſelf
1.47
Jefus
1.45
auffi
1.42
myſelf
1.42
Anſ
1.41
Activations Density 3.429%