INDEX
Explanations
statements related to philosophical and social concepts
statements and concepts related to religion and morality
New Auto-Interp
Negative Logits
Downs
-0.81
congr
-0.79
icago
-0.77
throats
-0.75
uld
-0.74
amins
-0.73
Celeb
-0.73
ulia
-0.72
Turns
-0.69
Buzz
-0.67
POSITIVE LOGITS
rooted
1.34
characterized
1.22
insepar
1.20
intimately
1.17
conceived
1.17
mediated
1.14
intrinsically
1.12
shaped
1.11
embodied
1.09
grounded
1.09
Activations Density 0.283%