INDEX
Explanations
negative or critical sentiments and terms related to health or medical treatments
New Auto-Interp
Negative Logits
Efq
-1.23
myſelf
-1.12
itſelf
-1.08
Jefus
-1.04
Theſe
-1.03
Majefty
-0.98
snippetHide
-0.97
chofe
-0.96
ſelf
-0.94
becauſe
-0.94
POSITIVE LOGITS
pre
0.59
Pre
0.57
Pre
0.55
ex
0.54
0.53
pr
0.53
pr
0.52
0.52
-
0.51
pre
0.51
Activations Density 0.326%