INDEX
Explanations
occurrences of punctuation marks, particularly periods
New Auto-Interp
Negative Logits
dete
-0.17
ÂĿ
-0.14
↵↵
-0.14
âce
-0.14
ноÑĪ
-0.14
PÅĻÃŃ
-0.14
chedulers
-0.13
""
-0.13
ataires
-0.13
↵↵
-0.13
POSITIVE LOGITS
00
0.52
000
0.45
0
0.44
02
0.44
05
0.43
01
0.43
03
0.42
06
0.41
04
0.41
08
0.40
Activations Density 0.483%