INDEX
Explanations
references to violent actions and injuries, particularly involving facial harm
New Auto-Interp
Negative Logits
abe
-0.17
elp
-0.17
687
-0.15
Bened
-0.14
inne
-0.14
pute
-0.14
.jetbrains
-0.14
ige
-0.13
idget
-0.13
lor
-0.13
POSITIVE LOGITS
Bund
0.15
ána
0.15
withString
0.15
bund
0.15
Roose
0.14
.parameter
0.14
thereby
0.14
ooled
0.14
|required
0.14
ormap
0.13
Activations Density 0.037%