INDEX
Explanations
negative feedback and complaints
New Auto-Interp
Negative Logits
studies
0.75
grids
0.70
graphicx
0.69
often
0.68
0.68
studies
0.65
as
0.64
enthusi
0.64
grids
0.63
的一些
0.63
POSITIVE LOGITS
disgusted
1.31
disgraceful
1.27
ruining
1.25
deplorable
1.23
humiliated
1.20
😡
1.19
intolerable
1.17
outraged
1.16
incompetence
1.16
disgusting
1.16
Activations Density 0.532%