INDEX
Explanations
sentences or phrases indicating negative emotional states or dilemmas
New Auto-Interp
Negative Logits
matchCondition
-0.80
HtmlAttribute
-0.78
sidemargin
-0.77
XmlAccessType
-0.75
__*/
-0.71
reaſon
-0.71
specialchars
-0.71
rungsseite
-0.68
juſ
-0.67
pleaſure
-0.66
POSITIVE LOGITS
-
0.45
order
0.43
Nicht
0.42
мато
0.42
<b>
0.41
ly
0.41
A
0.41
collective
0.40
ถม
0.40
하는
0.40
Activations Density 0.641%