INDEX
Explanations
concepts related to viewpoints or interpretations
phrases related to subjective interpretations and definitions of concepts
New Auto-Interp
Negative Logits
mson
-0.72
orst
-0.71
vine
-0.70
fam
-0.70
reau
-0.68
depended
-0.64
stead
-0.64
ktop
-0.63
hement
-0.63
imentary
-0.62
POSITIVE LOGITS
how
0.82
criminality
0.81
what
0.79
morality
0.77
reality
0.73
masculinity
0.72
sexuality
0.72
homosexuality
0.70
events
0.70
rationality
0.70
Activations Density 0.171%