INDEX
Explanations
years or decades
mentions of decades or specific years
New Auto-Interp
Negative Logits
aukee
-0.68
masc
-0.68
semble
-0.57
fing
-0.57
dylib
-0.55
venge
-0.54
Rivals
-0.53
avorite
-0.50
probing
-0.50
verbs
-0.50
POSITIVE LOGITS
s
1.29
ties
0.96
ixties
0.94
eties
0.89
ies
0.88
sie
0.84
enthal
0.83
sburg
0.78
-'
0.75
era
0.74
Activations Density 0.053%