INDEX
Explanations
specific dates in the format "ago X", where X is a number representing the time in the past
repeated phrases indicating time or temporal references
New Auto-Interp
Negative Logits
mathemat
-0.95
suspic
-0.84
belie
-0.78
plaus
-0.75
challeng
-0.72
glim
-0.68
sclerosis
-0.67
ifying
-0.66
uniqueness
-0.65
lining
-0.65
POSITIVE LOGITS
vernment
1.38
zzi
1.01
onga
0.99
edia
0.94
asca
0.92
zzo
0.88
xon
0.87
zeb
0.83
orthy
0.82
unin
0.82
Activations Density 0.020%