INDEX
Explanations
information related to research studies and surveys
New Auto-Interp
Negative Logits
çīĪ
-0.79
iHUD
-0.71
DragonMagazine
-0.69
utsche
-0.63
displayText
-0.62
APTER
-0.61
confir
-0.59
iple
-0.57
Laser
-0.54
Panzer
-0.54
POSITIVE LOGITS
themselves
0.92
rapists
0.85
their
0.84
psychologically
0.80
disproportionately
0.77
biologically
0.77
theirs
0.75
atheists
0.75
sexist
0.75
sexually
0.73
Activations Density 0.866%