INDEX
Explanations
threats and references to danger or harm
New Auto-Interp
Negative Logits
歓
-0.39
<eos>
-0.39
-0.39
suy
-0.39
M
-0.38
spire
-0.38
O
-0.38
distanciation
-0.37
ׁ
-0.36
(
-0.36
POSITIVE LOGITS
myſelf
1.02
itſelf
0.96
threaten
0.96
houſe
0.96
Majefty
0.94
threatened
0.94
ſelf
0.94
juſt
0.94
threatens
0.93
ſever
0.92
Activations Density 0.282%