INDEX
Explanations
adjectives pertaining to rational thought or behavior
terms related to rationality and civility in discourse
New Auto-Interp
Negative Logits
aunted
-0.84
ammy
-0.79
stals
-0.78
pain
-0.77
stress
-0.75
Strongh
-0.75
chu
-0.73
zona
-0.71
asso
-0.71
bender
-0.71
POSITIVE LOGITS
tarian
1.04
enough
0.80
\\\\\\\\
0.79
minded
0.78
centrist
0.78
democracies
0.77
Reviewer
0.77
establishment
0.76
ynes
0.75
glers
0.74
Activations Density 0.074%