INDEX
Explanations
commas in the text
year followed by comma
New Auto-Interp
Negative Logits
itſelf
-0.89
ſta
-0.87
myſelf
-0.85
leſs
-0.85
leſs
-0.78
ſche
-0.77
ſch
-0.77
eſſ
-0.76
juſ
-0.75
ſeveral
-0.75
POSITIVE LOGITS
,
0.78
(
0.55
,
0.53
0.51
",
0.47
”,
0.46
↵
0.46
in
0.45
on
0.44
–
0.42
Activations Density 0.072%