INDEX
Explanations
thought-provoking questions related to moral dilemmas
New Auto-Interp
Negative Logits
senal
-0.61
iage
-0.60
è»
-0.59
Street
-0.59
éĥ
-0.56
Living
-0.55
ãĤ©
-0.54
Rooms
-0.54
çͰ
-0.54
å¥
-0.54
POSITIVE LOGITS
causation
0.75
empir
0.71
empirical
0.70
economists
0.69
Argument
0.68
depends
0.66
prag
0.65
speculative
0.64
consequential
0.64
nonetheless
0.64
Activations Density 3.041%