INDEX
Explanations
causative terms, indicating actions leading to certain consequences
phrases indicating causal relationships or changes in society
New Auto-Interp
Negative Logits
Moss
-0.66
Owens
-0.64
urden
-0.63
moss
-0.61
Ware
-0.61
Pool
-0.61
Winc
-0.60
Licensed
-0.60
Monitor
-0.59
Cola
-0.59
POSITIVE LOGITS
revolutions
0.86
havoc
0.80
MpServer
0.76
EStream
0.74
riots
0.70
uate
0.70
ocument
0.70
choes
0.69
Discover
0.68
versible
0.67
Activations Density 0.044%