INDEX
Explanations
differences or contradictions in statements
phrases related to conversation and communication dynamics
New Auto-Interp
Negative Logits
culus
-0.98
atri
-0.82
cephal
-0.76
ahime
-0.76
ggle
-0.74
taboola
-0.74
anus
-0.72
ãĥī
-0.71
tnc
-0.71
ummer
-0.68
POSITIVE LOGITS
THEN
0.97
preferably
0.93
then
0.89
verbally
0.87
paraph
0.86
retweet
0.80
phrases
0.79
concise
0.79
criticize
0.79
orally
0.78
Activations Density 0.625%