INDEX
Explanations
temporal phrases indicating recurring actions or events
phrases that reference specific time periods or moments
New Auto-Interp
Negative Logits
terday
-0.67
MET
-0.67
ertodd
-0.63
liction
-0.62
cipline
-0.60
Ing
-0.58
ATHER
-0.58
rored
-0.58
NESS
-0.57
PB
-0.57
POSITIVE LOGITS
onwards
0.92
perspective
0.86
standpoint
0.86
onward
0.84
perspectives
0.72
ratch
0.72
orum
0.70
outset
0.69
inception
0.68
users
0.67
Activations Density 0.027%