INDEX
Explanations
negative expressions or sentiments in the text
New Auto-Interp
Negative Logits
<eos>
-0.59
on
-0.56
pen
-0.50
K
-0.49
z
-0.48
ablon
-0.48
Martinez
-0.48
oxin
-0.47
&
-0.47
No
-0.47
POSITIVE LOGITS
ſelf
1.18
Monfieur
1.13
auffi
1.09
ſelves
1.08
itſelf
1.08
myſelf
1.08
houſe
1.05
ſy
1.05
iſt
1.04
ſche
1.02
Activations Density 0.004%