INDEX
Explanations
mentions of deadlines or time frames for certain actions or events
predictions or commitments regarding future policy changes
New Auto-Interp
Negative Logits
esta
-0.82
Reviewer
-0.75
IENT
-0.73
surprises
-0.66
>[
-0.66
ractor
-0.62
breakout
-0.61
elo
-0.59
ibli
-0.58
APD
-0.57
POSITIVE LOGITS
2020
0.98
lest
0.98
indefinitely
0.93
2021
0.91
2025
0.91
outlawed
0.85
instead
0.84
2022
0.81
Instead
0.80
2024
0.80
Activations Density 0.805%