INDEX
Explanations
phrases and terms related to moral and legal judgments
New Auto-Interp
Negative Logits
acan
-0.68
iping
-0.67
nants
-0.67
rongh
-0.66
arta
-0.65
jong
-0.64
phabet
-0.62
76561
-0.61
afia
-0.60
axy
-0.59
POSITIVE LOGITS
raining
0.87
ceivable
0.75
impossible
0.72
folly
0.71
coincidence
0.69
ifiable
0.65
to
0.64
advisable
0.63
conceivable
0.62
EC
0.62
Activations Density 0.269%