INDEX
Explanations
contradictory statements or nuances in opinions
New Auto-Interp
Negative Logits
luck
-0.17
fortunately
-0.16
thankfully
-0.16
Yours
-0.15
ominator
-0.14
xC
-0.14
oun
-0.14
é¡
-0.14
fortunately
-0.14
sized
-0.14
POSITIVE LOGITS
man
0.22
dam
0.20
then
0.20
sometimes
0.19
who
0.18
c
0.17
seriously
0.17
thus
0.17
then
0.17
again
0.16
Activations Density 0.228%