INDEX
Explanations
words related to the tongue
references to the word "tongue."
New Auto-Interp
Negative Logits
ded
-0.92
iary
-0.83
irements
-0.80
Parenthood
-0.78
ding
-0.76
rity
-0.75
Occupations
-0.70
ividual
-0.70
rals
-0.68
idth
-0.68
POSITIVE LOGITS
tongue
1.04
tongues
0.89
ice
0.82
lips
0.78
mouth
0.74
ingen
0.74
aware
0.73
slur
0.73
fry
0.73
hook
0.72
Activations Density 0.015%