INDEX
Explanations
expressions of negation or denial
New Auto-Interp
Negative Logits
KommentareTeilen
-0.74
Muffins
-0.73
raszam
-0.72
fibrillation
-0.71
MPC
-0.69
:]:
-0.69
UNEP
-0.69
aspectj
-0.69
StateToProps
-0.68
ilesh
-0.68
POSITIVE LOGITS
Never
1.55
NEVER
1.49
NEVER
1.48
never
1.48
Never
1.46
never
1.41
EVER
1.21
Nunca
1.16
Ever
1.11
Nunca
1.11
Activations Density 0.050%