INDEX
Explanations
phrases related to making judgments
references to judgment or evaluations of people and actions
New Auto-Interp
Negative Logits
ohyd
-0.81
iliar
-0.79
uld
-0.79
ospons
-0.77
adra
-0.77
oteric
-0.70
dayName
-0.70
jc
-0.69
jri
-0.69
iere
-0.68
POSITIVE LOGITS
judging
0.85
jud
0.80
ģĸ
0.77
Gorsuch
0.74
judgment
0.73
judgement
0.72
Advocate
0.70
harshly
0.70
judgments
0.70
lessly
0.70
Activations Density 0.030%