INDEX
Explanations
instances of negotiation and dialogue
New Auto-Interp
Negative Logits
\e
-0.07
iscrim
-0.07
pres
-0.06
pector
-0.06
ucc
-0.06
iy
-0.06
imps
-0.06
idian
-0.06
ight
-0.06
potentially
-0.06
POSITIVE LOGITS
æķħ
0.10
pretended
0.09
fe
0.08
Fake
0.08
Fake
0.08
fake
0.08
pret
0.08
æķħ
0.08
åģĩ
0.07
åζéĢł
0.07
Activations Density 0.056%