INDEX
Explanations
words and phrases that express negative feelings, disapproval, or unethical behavior
negative sentiment
New Auto-Interp
Negative Logits
-0.49
dr
-0.47
p
-0.47
pex
-0.46
La
-0.46
dev
-0.44
Ber
-0.44
↵↵
-0.44
N
-0.44
Int
-0.44
POSITIVE LOGITS
itſelf
0.90
Majefty
0.88
pleaſure
0.83
Monfieur
0.81
Jefus
0.80
―――――
0.78
fubject
0.78
ſever
0.78
RenderAtEndOf
0.78
doubtnut
0.77
Activations Density 2.317%