INDEX
    Explanations

    references to the social media platform Facebook

    New Auto-Interp
    Negative Logits
    ocard
    -0.19
    ovation
    -0.15
    stav
    -0.15
     Shields
    -0.14
    @stop
    -0.14
    wall
    -0.14
    ield
    -0.14
    ochond
    -0.14
    urm
    -0.13
     pr
    -0.13
    POSITIVE LOGITS
    uce
    0.16
    ÏĤ
    0.15
    igne
    0.14
    igin
    0.14
    ÅĻeh
    0.14
    culate
    0.14
    erson
    0.14
    abama
    0.14
    atables
    0.14
    oles
    0.14
    Act Density 0.010%

    No Known Activations