INDEX
Explanations
phrases related to superlatives and significant events
references to significant events or achievements
New Auto-Interp
Negative Logits
ĪĴ
-0.73
aleb
-0.71
eneg
-0.68
actionDate
-0.67
iencies
-0.66
milo
-0.62
letters
-0.62
gradient
-0.61
URL
-0.61
¬¼
-0.60
POSITIVE LOGITS
EVER
1.07
ever
1.02
ever
0.97
since
0.83
imaginable
0.82
anywhere
0.74
!.
0.73
shortest
0.71
besides
0.69
*.
0.69
Activations Density 0.481%