INDEX
Explanations
mentions of age and age-related categories
New Auto-Interp
Negative Logits
ners
-0.16
uzzi
-0.16
coni
-0.16
rem
-0.15
znik
-0.15
baz
-0.15
rone
-0.15
iped
-0.15
jen
-0.15
mgr
-0.15
POSITIVE LOGITS
-old
0.32
bracket
0.24
brackets
0.21
Bracket
0.21
range
0.20
brush
0.20
-range
0.19
-app
0.19
-group
0.18
group
0.18
Activations Density 0.037%