INDEX
Explanations
comparisons of different situations over time
comparative phrases indicating changes over time
New Auto-Interp
Negative Logits
)].
-0.76
erous
-0.73
ilic
-0.56
Bit
-0.55
Poe
-0.54
)]
-0.53
wiki
-0.53
Centauri
-0.53
Appropri
-0.53
Khan
-0.53
POSITIVE LOGITS
ever
1.04
usual
0.91
before
0.85
usual
0.85
ever
0.83
EVER
0.82
elsewhere
0.82
anywhere
0.81
Previously
0.79
before
0.76
Activations Density 0.163%