INDEX
Explanations
statements and opinions, often introduced with verbs like 'say', 'added', 'argued', 'claimed', 'went', 'said', 'contended', or 'noted'
verbs and phrases related to statements and claims
New Auto-Interp
Negative Logits
prus
-0.63
adesh
-0.58
ptives
-0.56
confir
-0.53
speech
-0.52
ainted
-0.49
estern
-0.49
NL
-0.49
reporting
-0.49
teasp
-0.48
POSITIVE LOGITS
,
0.89
,,
0.82
*,
0.81
,[
0.75
.,
0.73
!,
0.69
,—
0.67
,...
0.67
®,
0.65
,-
0.64
Activations Density 0.192%