INDEX
Explanations
phrases related to conflicts or disagreements
the presence of conjunctions and phrases indicating comparisons or connections between ideas
New Auto-Interp
Negative Logits
Profile
-0.61
Guerrero
-0.59
Starr
-0.57
Sep
-0.57
Pledge
-0.57
during
-0.56
successfully
-0.55
QB
-0.55
essage
-0.55
GD
-0.55
POSITIVE LOGITS
wiser
0.71
equals
0.68
eh
0.64
shalt
0.62
stroke
0.62
pudding
0.61
nutshell
0.61
caveat
0.60
caveats
0.60
pired
0.59
Activations Density 0.857%