INDEX
Explanations
instructions followed by a colon
New Auto-Interp
Negative Logits
fourth
0.35
programmer
0.34
rename
0.34
outline
0.33
rary
0.33
programmer
0.33
aryng
0.32
[]}
0.32
auern
0.32
大家
0.31
POSITIVE LOGITS
uchun
0.44
Osaka
0.41
voor
0.41
:
0.41
0.40
غ
0.40
-
0.39
için
0.39
från
0.39
pierde
0.39
Activations Density 0.044%