INDEX
Explanations
by followed by a verb or time
New Auto-Interp
Negative Logits
K
1.15
ية
1.14
in
1.02
ل
1.00
ר
0.98
ar
0.91
en
0.89
em
0.89
ن
0.88
J
0.85
POSITIVE LOGITS
'
1.30
and
0.89
งาน
0.89
는데
0.87
จะ
0.86
by
0.85
지
0.83
virtue
0.82
cation
0.82
0.82
Activations Density 0.176%