INDEX
Explanations
mentions of age, particularly related to the number 18
New Auto-Interp
Negative Logits
pend
-0.19
anch
-0.19
ayah
-0.17
arth
-0.17
aries
-0.16
wel
-0.16
yte
-0.16
595
-0.15
kits
-0.15
yu
-0.15
POSITIVE LOGITS
-hole
0.17
ร
0.17
inded
0.17
th
0.16
arend
0.15
ly
0.15
-century
0.15
beros
0.15
isms
0.14
ieder
0.14
Activations Density 0.142%