INDEX
Explanations
phrases beginning with 'Whether'
the word "whether" and its variations, indicating uncertainty or choices being presented
New Auto-Interp
Negative Logits
agement
-0.77
ulations
-0.72
thal
-0.70
assi
-0.69
atari
-0.68
bage
-0.67
aging
-0.67
vention
-0.66
ode
-0.66
uers
-0.64
POSITIVE LOGITS
soever
1.09
consciously
0.96
intentional
0.78
theless
0.75
intentionally
0.72
orally
0.66
warranted
0.66
overtly
0.65
includ
0.64
trolling
0.63
Activations Density 0.028%