INDEX
    Explanations

    mentions of official complaints or reports

    instances of the word "complaint."

    New Auto-Interp
    Negative Logits
    artifacts
    -0.85
    itals
    -0.76
    raham
    -0.75
    aughs
    -0.74
    orth
    -0.69
    bern
    -0.69
    atomic
    -0.69
    mers
    -0.69
    sung
    -0.68
    eton
    -0.68
    POSITIVE LOGITS
     complaint
    1.07
     complaints
    1.05
     alleging
    0.92
     alleges
    0.84
     levied
    0.76
     complains
    0.76
     lodged
    0.75
     leveled
    0.73
    naire
    0.72
     complaining
    0.72
    Act Density 0.015%

    No Known Activations