INDEX
Explanations
expressions of honesty and confessions
admitting or confessing
New Auto-Interp
Negative Logits
surla
-0.76
ArgsConstructor
-0.70
RotationOrder
-0.57
########.
-0.55
InputBorder
-0.55
للمعارف
-0.53
PMailer
-0.52
exitRule
-0.52
قایناقلار
-0.52
الحره
-0.51
POSITIVE LOGITS
admit
0.64
admitted
0.59
Admit
0.56
正直
0.55
admits
0.52
confess
0.51
confessed
0.50
frankly
0.49
admitting
0.48
honestly
0.47
Activations Density 0.059%