INDEX
Explanations
mentions of official complaints or reports
instances of the word "complaint."
New Auto-Interp
Negative Logits
artifacts
-0.85
itals
-0.76
raham
-0.75
aughs
-0.74
orth
-0.69
bern
-0.69
atomic
-0.69
mers
-0.69
sung
-0.68
eton
-0.68
POSITIVE LOGITS
complaint
1.07
complaints
1.05
alleging
0.92
alleges
0.84
levied
0.76
complains
0.76
lodged
0.75
leveled
0.73
naire
0.72
complaining
0.72
Activations Density 0.015%