INDEX
Explanations
phrases related to progress or development over time
phrases indicating the status or condition of various situations
New Auto-Interp
Negative Logits
lication
-0.79
lement
-0.79
Cosponsors
-0.74
lees
-0.73
onga
-0.71
pora
-0.70
hardt
-0.67
obook
-0.67
ordes
-0.67
roll
-0.66
POSITIVE LOGITS
happening
0.89
happ
0.86
happen
0.79
cov
0.76
transpired
0.76
downhill
0.73
wrong
0.68
unfolded
0.68
undone
0.68
spir
0.66
Activations Density 0.222%