INDEX
Explanations
punctuation marks and their occurrences in the text
New Auto-Interp
Negative Logits
aData
-0.82
ajur
-0.80
Percival
-0.77
Furman
-0.75
likle
-0.74
𝐱
-0.73
entanto
-0.73
𝐥
-0.73
Fergus
-0.72
seqs
-0.71
POSITIVE LOGITS
__*/
1.12
)".
1.10
",
1.07
)",
1.03
'",
1.01
'".
0.97
".
0.93
?",
0.92
Walpole
0.92
]",
0.89
Activations Density 0.173%