INDEX
Explanations
words related to body parts
references to physical appearances, particularly focused on body parts and health-related terms
New Auto-Interp
Negative Logits
theless
-0.70
FISA
-0.65
nown
-0.65
Archangel
-0.64
DRAG
-0.64
Pilgrim
-0.64
quartered
-0.63
Defenders
-0.63
inement
-0.63
Witness
-0.63
POSITIVE LOGITS
ours
0.92
pton
0.92
geries
0.87
ahon
0.86
gery
0.82
mat
0.81
arter
0.79
tum
0.79
sheets
0.78
tub
0.77
Activations Density 0.019%