INDEX
Explanations
phrases related to personal beliefs and moral dilemmas
New Auto-Interp
Negative Logits
Efq
-1.07
houſe
-1.04
purpoſe
-1.01
Jefus
-1.01
Majefty
-1.00
Monfieur
-0.98
pleaſure
-0.98
ſta
-0.97
himſelf
-0.92
myſelf
-0.92
POSITIVE LOGITS
my
0.55
I
0.48
isn
0.47
that
0.46
[
0.45
cre
0.43
die
0.43
no
0.42
#
0.42
human
0.42
Activations Density 2.655%