INDEX
Explanations
commas followed by common words
items following punctuation
New Auto-Interp
Negative Logits
y
1.61
l
1.59
ע
1.52
ی
1.39
u
1.20
s
1.17
c
1.13
b
1.05
t
1.02
ى
1.01
POSITIVE LOGITS
for
1.12
Perché
0.92
ка
0.91
absurd
0.87
expedit
0.87
జేపీ
0.84
grizz
0.84
médicos
0.83
cristianos
0.82
abhor
0.82
Activations Density 0.054%