INDEX
    Explanations

    names with the format of first name followed by last name

    mentions of social media usernames or handles

    New Auto-Interp
    Negative Logits
     instincts
    -0.66
    ãĥĥãĥī
    -0.59
    Ͻ
    -0.59
     solicitation
    -0.56
    ront
    -0.55
     outsider
    -0.55
    Ĥª
    -0.55
    inct
    -0.54
     inexper
    -0.53
    eries
    -0.52
    POSITIVE LOGITS
     October
    0.96
     September
    0.96
     August
    0.95
     December
    0.95
     February
    0.94
     November
    0.93
     April
    0.93
     July
    0.92
     June
    0.91
     January
    0.91
    Act Density 0.038%

    No Known Activations