INDEX
Explanations
references to physical human bodies
references to bodies, particularly in a context related to victims or remains
New Auto-Interp
Negative Logits
Hoover
-0.80
BILITY
-0.71
é¾į
-0.70
Channel
-0.67
Liberty
-0.66
Americ
-0.64
UGE
-0.64
Mega
-0.63
BF
-0.63
Cobb
-0.62
POSITIVE LOGITS
bodies
1.08
anguage
1.07
guards
1.02
hops
0.98
builders
0.90
tones
0.90
wagen
0.89
nodd
0.87
rats
0.86
ngth
0.86
Activations Density 0.007%