INDEX
    Explanations

    Twitter handles to follow

    instances of the word "Follow" indicating social media references

    New Auto-Interp
    Negative Logits
    pite
    -0.84
    inese
    -0.73
     ILCS
    -0.71
    negie
    -0.67
    pressed
    -0.66
    cit
    -0.66
    unicip
    -0.66
    rouse
    -0.65
    ately
    -0.64
    dfx
    -0.64
    POSITIVE LOGITS
     Follow
    0.92
    cies
    0.84
    ers
    0.82
    ership
    0.81
    Follow
    0.78
     @
    0.77
     Updates
    0.75
    ed
    0.72
    follow
    0.69
    ï¸ı
    0.69
    Act Density 0.020%

    No Known Activations