INDEX
Explanations
strongly worded declarations of authorial opinions or arguments
the word "that" indicating arguments or claims
New Auto-Interp
Negative Logits
backer
-0.80
stal
-0.70
Guard
-0.67
mouth
-0.65
api
-0.63
SEE
-0.60
inar
-0.60
Champ
-0.60
atro
-0.60
aq
-0.60
POSITIVE LOGITS
there
0.75
although
0.72
justifies
0.69
preserving
0.69
abol
0.67
someday
0.67
we
0.65
whoever
0.65
prevailed
0.65
"[
0.64
Activations Density 0.217%