INDEX
Explanations
occurrences of punctuation marks and specific sentence structures
New Auto-Interp
Negative Logits
akra
-0.16
imits
-0.15
åIJĪæł¼
-0.14
minul
-0.14
å±Ĭ
-0.14
plein
-0.14
urses
-0.14
istributions
-0.14
âĹĦ
-0.14
/history
-0.14
POSITIVE LOGITS
bi
0.18
career
0.17
early
0.16
biography
0.16
personal
0.16
Bi
0.16
Bi
0.15
works
0.15
marriage
0.15
as
0.15
Activations Density 0.076%