INDEX
Explanations
references to violence and crime-related events
preceding a period or comma
ending punctuation
New Auto-Interp
Negative Logits
ujednoznacz
-0.84
ScopeManager
-0.71
TagMode
-0.68
httphttps
-0.66
disambiguazione
-0.65
Personendaten
-0.64
AnimationsModule
-0.64
featureID
-0.63
:✨
-0.62
للاسماء
-0.61
POSITIVE LOGITS
).
0.57
.)
0.53
.).
0.51
').
0.48
toutefois
0.47
).}
0.47
.}
0.46
».
0.46
).\\
0.45
.")
0.44
Activations Density 1.332%