INDEX
Explanations
phrases that indicate seeking information or discovery
New Auto-Interp
Negative Logits
onto
-0.17
itsu
-0.17
zoekt
-0.15
kel
-0.14
uments
-0.14
ito
-0.14
anzi
-0.14
innie
-0.13
ched
-0.13
velop
-0.13
POSITIVE LOGITS
about
0.23
_about
0.18
about
0.16
strcasecmp
0.16
zia
0.14
braco
0.14
bout
0.14
θι
0.14
how
0.14
asin
0.14
Activations Density 0.015%