INDEX
Explanations
instances of complaints and negativity
instances of the word "complaints."
New Auto-Interp
Negative Logits
Recon
-0.68
eton
-0.66
NAS
-0.65
sf
-0.64
assisted
-0.63
arta
-0.63
insert
-0.63
bered
-0.62
aughs
-0.62
rifice
-0.62
POSITIVE LOGITS
complaints
1.30
complains
0.91
complain
0.88
complaint
0.87
complaining
0.86
complained
0.85
leveled
0.82
grievances
0.78
levied
0.76
alleging
0.76
Activations Density 0.014%