INDEX
Explanations
repeated characters or symbols, particularly vowels with diacritics
New Auto-Interp
Negative Logits
isci
-0.18
itaire
-0.15
Fiction
-0.15
mez
-0.15
jc
-0.15
ÙIJب
-0.15
Berger
-0.15
gli
-0.14
offs
-0.14
avers
-0.14
POSITIVE LOGITS
ldre
0.21
olid
0.17
olian
0.17
gypt
0.16
olist
0.16
rz
0.15
t
0.15
neas
0.15
hn
0.15
onde
0.15
Activations Density 0.006%