INDEX
Explanations
words related to criticism or judgment
terms related to judgment or evaluation of people, policies, or actions
New Auto-Interp
Negative Logits
tein
-0.69
Milky
-0.63
rontal
-0.63
ettlement
-0.59
oya
-0.56
abiding
-0.55
ipeg
-0.55
ggies
-0.54
rez
-0.54
rouse
-0.54
POSITIVE LOGITS
favorably
1.01
unfairly
0.98
skept
0.92
harshly
0.79
Reviewer
0.79
igated
0.72
by
0.71
graded
0.70
Ú
0.70
merciless
0.70
Activations Density 0.137%