INDEX
Explanations
dates mentioned in historical contexts
specific years mentioned in the text
New Auto-Interp
Negative Logits
Dialogue
-0.76
newsp
-0.63
lawy
-0.62
unres
-0.60
olar
-0.60
explanatory
-0.59
scrut
-0.58
ricular
-0.56
hereby
-0.56
eanor
-0.55
POSITIVE LOGITS
-'
0.83
Coliseum
0.73
ãĥĥãĥī
0.73
é¾įå¥ij士
0.72
ãĥ¼ãĥĨãĤ£
0.71
vironment
0.69
ãĥ¯ãĥ³
0.69
ãĥŁ
0.68
â̲
0.67
chev
0.66
Activations Density 0.171%