INDEX
Explanations
numerical entities like years and links in a text
references to notable dates and media sources
New Auto-Interp
Negative Logits
aint
-0.76
aries
-0.67
sty
-0.63
antit
-0.59
hetti
-0.58
brist
-0.58
somet
-0.57
undert
-0.57
44
-0.57
Urs
-0.56
POSITIVE LOGITS
7
1.12
727
0.94
7
0.94
707
0.90
Seventh
0.87
667
0.84
677
0.84
747
0.83
697
0.81
767
0.80
Activations Density 0.063%