INDEX
Explanations
phrases related to affirmations
affirmations or confirmations within the text
New Auto-Interp
Negative Logits
abal
-0.77
èĢħ
-0.76
arted
-0.73
rance
-0.70
ridor
-0.69
ounded
-0.66
vati
-0.65
liner
-0.64
Discussion
-0.64
gall
-0.64
POSITIVE LOGITS
sir
0.96
technically
0.79
THERE
0.77
yes
0.74
please
0.69
anecd
0.67
yeah
0.66
sexism
0.65
there
0.64
insofar
0.63
Activations Density 0.050%