INDEX
Negative Logits
Sensors
-0.09
sensors
-0.09
Sensor
-0.08
Sensors
-0.08
sensor
-0.08
sensor
-0.08
Io
-0.07
वर्ष
-0.07
_sensor
-0.07
Sensor
-0.07
POSITIVE LOGITS
derog
0.14
disrespect
0.11
racist
0.11
respectful
0.10
hateful
0.10
sexist
0.10
العن
0.10
profanity
0.10
insulting
0.09
lädt
0.09
Activations Density 0.042%