INDEX
Explanations
punctuation marks, particularly parentheses and quotation marks
New Auto-Interp
Negative Logits
toa
-0.17
UNT
-0.15
.asc
-0.14
Wy
-0.14
/opt
-0.13
âce
-0.13
èĪ
-0.13
mmo
-0.13
eid
-0.13
andler
-0.13
POSITIVE LOGITS
ulla
0.15
insky
0.14
ús
0.14
McCabe
0.14
and
0.14
lec
0.14
mant
0.13
座
0.13
¡
0.13
пи
0.13
Activations Density 0.057%