INDEX
Explanations
mentions of young individuals or youth-related topics
New Auto-Interp
Negative Logits
Young
-0.30
younger
-0.30
Young
-0.29
young
-0.29
young
-0.27
jeune
-0.26
молод
-0.26
youngest
-0.25
jeunes
-0.23
joven
-0.23
POSITIVE LOGITS
(er
0.33
blood
0.33
lings
0.32
sters
0.31
ish
0.31
-ad
0.31
stown
0.28
ster
0.28
ling
0.28
/new
0.25
Activations Density 0.038%