INDEX
Explanations
urban and wilderness-related words
references to body parts and medical terminology
New Auto-Interp
Negative Logits
uality
-0.74
ually
-0.71
arians
-0.70
ENTS
-0.67
icable
-0.66
IMAGES
-0.66
ember
-0.64
Debor
-0.64
aterial
-0.63
Surveillance
-0.63
POSITIVE LOGITS
uckle
1.04
ãĥ¼ãĥ«
0.90
ften
0.90
ffiti
0.89
uckles
0.86
vel
0.84
sein
0.82
bum
0.79
gger
0.77
hander
0.77
Activations Density 0.020%