INDEX
Explanations
phrases indicating a change has occurred or is occurring
instances of the word "anymore," indicating a theme of change or cessation
New Auto-Interp
Negative Logits
erest
-0.72
tale
-0.68
ortment
-0.67
stood
-0.66
urer
-0.65
maximum
-0.62
yp
-0.61
ouri
-0.60
ffe
-0.59
erate
-0.58
POSITIVE LOGITS
adays
0.89
than
0.79
iatus
0.77
;)
0.77
nces
0.77
:-)
0.73
!!!!!
0.71
:)
0.71
:(
0.68
ðŁĻĤ
0.68
Activations Density 0.023%