INDEX
Explanations
years mentioned within parentheses
punctuation marks and their contexts in a text
New Auto-Interp
Negative Logits
enta
-0.68
agog
-0.62
aban
-0.62
zers
-0.62
arna
-0.60
tube
-0.60
therap
-0.60
kus
-0.59
elling
-0.59
liga
-0.59
POSITIVE LOGITS
[
0.69
then
0.67
etc
0.63
then
0.62
whereas
0.62
nevertheless
0.61
0.60
selves
0.60
secondly
0.60
furthermore
0.59
Activations Density 0.150%