INDEX
Explanations
instances of significant moral or emotional reactions, particularly related to issues of injustice and societal perception
New Auto-Interp
Negative Logits
меÑĢик
-0.14
529
-0.14
gün
-0.14
regional
-0.14
Roland
-0.14
FormGroup
-0.13
radio
-0.13
olina
-0.13
radioactive
-0.13
radio
-0.13
POSITIVE LOGITS
Rip
0.33
Jack
0.32
rip
0.29
ripper
0.27
Jack
0.27
JACK
0.26
jack
0.23
jack
0.21
Victorian
0.21
serial
0.20
Activations Density 0.003%