INDEX
Explanations
phrases expressing clarity or candor
phrases emphasizing clarity and straightforwardness
New Auto-Interp
Negative Logits
ources
-0.70
issance
-0.64
Combine
-0.64
urn
-0.62
naires
-0.59
ighting
-0.58
ERE
-0.58
orer
-0.57
pockets
-0.57
Erie
-0.56
POSITIVE LOGITS
blunt
1.15
frank
1.12
honest
1.04
unequiv
0.96
clear
0.96
explicit
0.90
simple
0.87
precise
0.86
truthful
0.85
simple
0.85
Activations Density 0.092%