INDEX
Explanations
words related to reaching a final decision or reaching an agreement
statements about conclusions and opinions
New Auto-Interp
Negative Logits
xtap
-0.80
quet
-0.60
quin
-0.60
iries
-0.57
erc
-0.57
quer
-0.57
illions
-0.54
egu
-0.53
trailed
-0.53
agu
-0.52
POSITIVE LOGITS
that
1.12
that
1.09
THAT
0.87
thats
0.85
otherwise
0.83
THEY
0.82
they
0.78
there
0.77
there
0.77
That
0.72
Activations Density 0.408%