INDEX
Explanations
phrases that reflect conflicting views or hypocrisy in discussions
New Auto-Interp
Negative Logits
ÎŃÏģγ
-0.16
ayar
-0.15
lik
-0.15
ensibly
-0.15
MeasureSpec
-0.14
flower
-0.14
dro
-0.14
ãĥªãĥ¼ãĤº
-0.14
):?>↵
-0.14
><?
-0.14
POSITIVE LOGITS
isz
0.15
mac
0.15
antes
0.15
ascar
0.14
udic
0.14
Deutsch
0.14
DOT
0.14
adow
0.14
Äijỡ
0.14
Zimmer
0.14
Activations Density 0.114%