INDEX
Explanations
imperative verbs and function definitions in code
New Auto-Interp
Negative Logits
amon
-0.17
tember
-0.16
ancies
-0.16
jich
-0.16
à¹Ģส
-0.15
Ú©ÛĮ
-0.14
ÑıÑĩ
-0.14
é¡į
-0.14
گرÛĮ
-0.14
ÏĩÏİ
-0.14
POSITIVE LOGITS
↵
0.20
:↵
0.16
ince
0.16
zens
0.16
not
0.16
19
0.16
oley
0.15
0.15
oulos
0.15
Cong
0.15
Activations Density 0.005%