INDEX
Explanations
hypocrisy in arguments or statements, particularly when there is a disconnect between actions and stated beliefs
New Auto-Interp
Negative Logits
Jefus
-0.77
ientôt
-0.75
Efq
-0.71
ffions
-0.70
houſe
-0.70
ftant
-0.69
pleaſure
-0.69
myſelf
-0.69
Gizmos
-0.68
iffion
-0.68
POSITIVE LOGITS
<bos>
0.60
文中
0.54
참고
0.49
concluding
0.49
discusses
0.46
conclude
0.46
id
0.45
detailed
0.45
subsubsection
0.44
authors
0.44
Activations Density 0.916%