INDEX
Explanations
assertive statements or claims within a text
phrases that indicate claims or statements about specific events or actions
New Auto-Interp
Negative Logits
english
-0.84
Laughs
-0.82
byss
-0.81
register
-0.79
rex
-0.76
vc
-0.76
mmmm
-0.75
utical
-0.74
erenn
-0.73
EStream
-0.73
POSITIVE LOGITS
they
0.88
ousted
0.85
Saddam
0.83
he
0.82
hackers
0.82
Barack
0.79
millions
0.78
President
0.77
she
0.76
Hillary
0.75
Activations Density 0.217%