INDEX
Explanations
mentions or references to human skin
references to skin color and related themes of identity
New Auto-Interp
Negative Logits
shire
-0.70
Phant
-0.67
Leap
-0.66
Leader
-0.66
umar
-0.64
Mun
-0.64
Lever
-0.63
Dane
-0.63
Vide
-0.63
PUT
-0.62
POSITIVE LOGITS
ned
1.52
ning
1.30
ny
1.05
beard
1.01
skinned
0.99
burn
0.95
walker
0.95
graft
0.94
tones
0.89
powder
0.88
Activations Density 0.040%