INDEX
Explanations
statements introducing arguments or someone presenting a point of view
statements or claims made by individuals
New Auto-Interp
Negative Logits
stand
-0.71
mean
-0.68
happen
-0.66
supposed
-0.64
sound
-0.62
clear
-0.62
dotted
-0.62
necessary
-0.61
present
-0.61
appropriate
-0.59
POSITIVE LOGITS
argues
2.20
warns
2.12
concedes
2.12
acknowledges
2.09
contends
2.09
insists
2.08
accuses
2.07
asks
2.05
admits
2.05
blames
2.05
Activations Density 0.071%