INDEX
Explanations
negative connotations or criticisms related to experience and quality
New Auto-Interp
Negative Logits
myſelf
-1.55
Theſe
-1.52
Efq
-1.51
―――――
-1.41
pleaſure
-1.41
Monfieur
-1.39
faſt
-1.39
itſelf
-1.38
whoſe
-1.37
becauſe
-1.35
POSITIVE LOGITS
<eos>
1.32
↵↵
1.10
↵
0.95
0.84
The
0.80
(
0.78
In
0.73
a
0.72
the
0.70
I
0.70
Activations Density 0.671%