INDEX
Explanations
information related to social issues and current events
New Auto-Interp
Negative Logits
urches
-0.64
bath
-0.62
ogenesis
-0.58
Star
-0.58
compares
-0.56
aroo
-0.56
Redditor
-0.56
CONCLUS
-0.56
Breaking
-0.55
orial
-0.54
POSITIVE LOGITS
previously
1.60
hitherto
1.36
formerly
1.29
originally
1.17
hoped
1.05
previous
1.04
normally
1.00
anticipated
0.99
dormant
0.97
otherwise
0.93
Activations Density 0.668%