INDEX
Explanations
significant years or dates written in the format of two numbers separated by a dash and ending in a zero followed by other numbers
specific years or numerical dates in the text
New Auto-Interp
Negative Logits
irlf
-0.72
lying
-0.69
bunny
-0.68
gren
-0.67
peeled
-0.66
tremend
-0.64
hottest
-0.64
brightest
-0.64
holiday
-0.63
igating
-0.63
POSITIVE LOGITS
90
1.11
504
0.98
74
0.97
88
0.97
91
0.97
65
0.97
70
0.96
80
0.96
85
0.95
95
0.94
Activations Density 0.038%