INDEX
Explanations
phrases related to criticism or critique
instances of criticism towards people, organizations, or actions
New Auto-Interp
Negative Logits
aceae
-0.77
frog
-0.76
ère
-0.73
*.
-0.72
producing
-0.71
hots
-0.70
toget
-0.70
tackle
-0.70
stable
-0.69
llular
-0.68
POSITIVE LOGITS
portrayal
1.15
lack
1.06
inaction
1.05
notion
1.04
motives
1.01
adequ
1.00
legality
0.96
hypocrisy
0.95
validity
0.92
legitimacy
0.92
Activations Density 0.272%