INDEX
Explanations
questions about knowledge or understanding
New Auto-Interp
Negative Logits
ishable
-1.14
otom
-1.13
isco
-1.07
izont
-1.06
attery
-1.05
stros
-1.03
ouls
-1.03
acco
-1.02
aez
-1.02
interstitial
-1.01
POSITIVE LOGITS
how
1.29
lege
1.18
exactly
1.15
whether
1.06
why
1.05
what
1.01
ABOUT
1.01
ledge
1.00
ledged
0.97
about
0.96
Activations Density 0.336%