INDEX
Explanations
references to specific ages or time periods
New Auto-Interp
Negative Logits
ãģ¾ãģŁ
-0.17
:eq
-0.16
anch
-0.16
pend
-0.16
uen
-0.15
indow
-0.15
anmar
-0.15
اظ
-0.15
aro
-0.15
ask
-0.14
POSITIVE LOGITS
-hole
0.18
hole
0.17
ires
0.16
holes
0.15
elerik
0.15
quat
0.15
ilon
0.15
alis
0.14
ewolf
0.14
red
0.14
Activations Density 0.124%