INDEX
Explanations
punctuation and special character patterns, particularly apostrophes and quotation marks
New Auto-Interp
Negative Logits
åłĤ
-0.16
984
-0.15
449
-0.15
996
-0.14
ivant
-0.14
modes
-0.14
lauf
-0.14
âĢĮاÙĦ
-0.14
sse
-0.13
osu
-0.13
POSITIVE LOGITS
ÂĢÂĻ
0.20
edar
0.15
нак
0.15
owed
0.14
zbollah
0.14
quist
0.14
-ts
0.14
ogene
0.14
wan
0.14
âĢį
0.13
Activations Density 0.067%