INDEX
    Explanations

    mentions of notable individuals' names

    words related to negative connotations or undesirable situations

    New Auto-Interp
    Negative Logits
    pection
    -0.66
    SPONSORED
    -0.65
    owship
    -0.65
    SAY
    -0.62
     Takeru
    -0.61
     ster
    -0.59
     footing
    -0.59
    cling
    -0.58
    GBT
    -0.58
    SPA
    -0.57
    POSITIVE LOGITS
    vous
    0.95
    ÅĤ
    0.87
    owicz
    0.77
    acan
    0.76
    henko
    0.75
    ewski
    0.72
    inis
    0.71
    án
    0.70
     Mehran
    0.70
    adesh
    0.69
    Act Density 0.235%

    No Known Activations