INDEX
Explanations
various references to threats and dangers in social, political, and environmental contexts
New Auto-Interp
Negative Logits
iao
-0.21
artin
-0.17
ocker
-0.15
Forbidden
-0.15
ocket
-0.15
quine
-0.14
ecided
-0.14
Shock
-0.14
.pixel
-0.14
ctal
-0.14
POSITIVE LOGITS
posed
0.40
Pos
0.29
posed
0.25
pos
0.23
facing
0.22
faced
0.22
ened
0.22
ening
0.22
pose
0.22
assessment
0.21
Activations Density 0.042%