INDEX
Explanations
expressions of concern and inquiry regarding emotional or psychological states
New Auto-Interp
Negative Logits
ConstraintMaker
-0.77
ujednoznacz
-0.75
müſſen
-0.72
ſei
-0.67
<=",
-0.66
laſſen
-0.65
<unused68>
-0.64
<unused79>
-0.64
<unused8>
-0.63
<unused52>
-0.63
POSITIVE LOGITS
toHexString
0.30
worse
0.30
halfway
0.29
fucking
0.28
Damit
0.28
they
0.27
összes
0.27
THE
0.26
die
0.26
half
0.26
Activations Density 0.180%