INDEX
Explanations
age descriptions, particularly the phrase "old"
references to ages in a specific format
New Auto-Interp
Negative Logits
anwhile
-0.84
akespe
-0.77
hod
-0.76
oldown
-0.75
destro
-0.74
ullivan
-0.74
ainer
-0.73
acly
-0.72
antha
-0.72
ipedia
-0.71
POSITIVE LOGITS
boy
0.89
girl
0.72
ish
0.72
Tribune
0.72
Jah
0.71
Boy
0.71
Frenchman
0.71
Tav
0.66
York
0.66
Indonesian
0.65
Activations Density 0.029%