INDEX
Explanations
positive news or updates
phrases indicating positive outcomes or news
New Auto-Interp
Negative Logits
edIn
-0.74
abwe
-0.66
renheit
-0.65
Cola
-0.64
uilding
-0.64
zai
-0.62
jri
-0.61
arate
-0.61
atever
-0.61
appropriated
-0.60
POSITIVE LOGITS
thing
0.85
est
0.84
iest
0.80
folks
0.78
stuff
0.78
guy
0.75
bits
0.74
ol
0.74
ones
0.73
guys
0.73
Activations Density 0.084%