INDEX
Explanations
phrases related to criticism or dissatisfaction
references to complaints and grievances
New Auto-Interp
Negative Logits
unin
-0.73
Kop
-0.69
alive
-0.65
chie
-0.64
rican
-0.63
Alpha
-0.63
Sagan
-0.62
Sequence
-0.62
olit
-0.61
Recon
-0.61
POSITIVE LOGITS
complaints
3.61
complaint
2.59
grievances
2.16
complain
2.09
complains
2.05
criticisms
1.90
objections
1.88
accusations
1.78
complained
1.72
complaining
1.68
Activations Density 0.016%