INDEX
Explanations
statements or declarations made within an official report
New Auto-Interp
Negative Logits
Occup
-0.66
pron
-0.65
grain
-0.64
Maker
-0.64
Pont
-0.63
ben
-0.63
brook
-0.62
acqu
-0.62
spectator
-0.61
patch
-0.61
POSITIVE LOGITS
titled
0.74
furthermore
0.70
summar
0.70
scathing
0.70
quoting
0.69
omin
0.69
comprises
0.68
seq
0.66
authors
0.66
convinc
0.65
Activations Density 10.210%