INDEX
Explanations
so/very followed by descriptive word
New Auto-Interp
Negative Logits
و
0.41
ل
0.38
ور
0.37
ين
0.35
expts
0.34
ك
0.34
codons
0.33
’
0.33
чках
0.33
수에
0.33
POSITIVE LOGITS
was
0.43
to
0.43
to
0.41
0.36
tn
0.36
pada
0.36
ton
0.35
\
0.35
ti
0.34
ts
0.33
Activations Density 0.292%