INDEX
Explanations
phrases related to asserting opinions forcefully
assertive statements challenging misconceptions or claims
New Auto-Interp
Negative Logits
ukong
-0.80
knit
-0.71
chron
-0.69
figured
-0.66
iverpool
-0.63
Nurs
-0.62
Elliot
-0.60
anchester
-0.60
encers
-0.59
encer
-0.59
POSITIVE LOGITS
unworthy
1.00
unacceptable
0.99
folly
0.96
irresponsible
0.95
foolish
0.93
heresy
0.93
counterproductive
0.92
futile
0.90
dishon
0.88
invalid
0.86
Activations Density 0.440%