INDEX
Explanations
keywords related to issues of morality, self-interest, and personal attributes or actions
terms related to societal behavior and interpersonal relationships
New Auto-Interp
Negative Logits
]=
-0.66
}}}
-0.66
}}
-0.63
ãĤ´ãĥ³
-0.61
Cheong
-0.59
TOD
-0.57
)]
-0.56
atorium
-0.56
writ
-0.55
fixme
-0.55
POSITIVE LOGITS
that
1.31
that
1.07
THAT
1.04
who
0.94
which
0.87
whom
0.86
That
0.78
who
0.78
That
0.77
whose
0.76
Activations Density 0.282%