INDEX
Explanations
occurrences of specific formatting markers or placeholders in text
New Auto-Interp
Negative Logits
+#+#
-1.13
autorytatywna
-0.96
CanadaChoose
-0.94
Hentet
-0.94
nahilalakip
-0.93
cherchés
-0.92
Rüyada
-0.92
EconPapers
-0.92
ffilmiau
-0.90
Signalez
-0.90
POSITIVE LOGITS
↵↵
0.96
0.68
↵
0.68
↵↵↵
0.67
<eos>
0.60
↵↵↵↵↵
0.52
However
0.50
.
0.49
↵↵↵↵
0.48
0.47
Activations Density 0.021%