INDEX
    Explanations

    proper nouns and names, potentially related to investigations or scandals

    New Auto-Interp
    Negative Logits
     thous
    -0.77
     conclud
    -0.77
     [*
    -0.71
     Vaugh
    -0.71
    ãĥ¼ãĥĨ
    -0.68
     proport
    -0.66
     vulner
    -0.65
     ingred
    -0.63
    ModLoader
    -0.62
     detrim
    -0.61
    POSITIVE LOGITS
    zeb
    0.68
    imo
    0.67
    mt
    0.67
    ice
    0.64
    gallery
    0.62
    letters
    0.61
    ush
    0.61
    info
    0.61
    amin
    0.61
    oba
    0.61
    Act Density 0.074%

    No Known Activations