INDEX
Explanations
mentions of physical pain or discomfort
terms related to various types of pain or discomfort
New Auto-Interp
Negative Logits
ODUCT
-0.78
ogue
-0.73
chrom
-0.70
Orient
-0.64
$$$$
-0.64
ship
-0.64
oting
-0.64
DERR
-0.63
Arist
-0.59
pregn
-0.59
POSITIVE LOGITS
lla
0.94
utic
0.93
phrine
0.90
rette
0.87
tto
0.87
ments
0.81
hene
0.81
tta
0.81
lli
0.81
te
0.80
Activations Density 0.027%