INDEX
    Explanations

    names, particularly those associated with crime or scandals

    New Auto-Interp
    Negative Logits
    oog
    -0.20
    ovel
    -0.19
    ed
    -0.18
    iag
    -0.18
    iens
    -0.16
    ovsky
    -0.16
    ogie
    -0.16
    eses
    -0.16
    oton
    -0.16
    ogs
    -0.15
    POSITIVE LOGITS
    rr
    0.26
    ington
    0.22
    inger
    0.22
    r
    0.21
    era
    0.20
    amient
    0.20
    icks
    0.20
    ick
    0.19
    ort
    0.19
    itt
    0.19
    Act Density 0.022%

    No Known Activations