INDEX
Explanations
expressions of denial or rejection
New Auto-Interp
Negative Logits
fevere
-0.63
phare
-0.59
Majefty
-0.56
inafter
-0.56
pandémie
-0.53
pleins
-0.53
kasarigan
-0.52
facie
-0.52
fsch
-0.52
Полез
-0.51
POSITIVE LOGITS
EVER
0.54
NewUrlParser
0.52
argout
0.50
any
0.50
ever
0.50
ControllerAdvice
0.49
Италијани
0.47
entuh
0.46
TagHelpers
0.46
kloped
0.46
Activations Density 0.312%