INDEX
Explanations
mentions of physical illness or negative health conditions
occurrences of the word "sick"
New Auto-Interp
Negative Logits
unlaw
-0.75
unden
-0.67
compr
-0.67
ETHOD
-0.65
Unch
-0.64
inational
-0.64
merce
-0.63
anship
-0.62
AUT
-0.62
adr
-0.61
POSITIVE LOGITS
ening
1.20
bay
1.14
ened
1.06
nesses
0.94
estro
0.87
le
0.87
ly
0.86
est
0.85
ert
0.85
ness
0.82
Activations Density 0.016%