INDEX
Explanations
phrases related to negative health conditions, particularly focusing on physical and mental sickness
New Auto-Interp
Negative Logits
unlaw
-0.75
Unch
-0.68
principals
-0.67
Goodwin
-0.67
wcsstore
-0.67
sanctioned
-0.66
guid
-0.64
Noir
-0.64
Hier
-0.62
CLS
-0.61
POSITIVE LOGITS
ening
1.49
ened
1.34
bay
1.25
nesses
0.99
ness
0.93
ly
0.93
er
0.91
estro
0.90
igue
0.90
ens
0.89
Activations Density 0.054%