INDEX
    Explanations

    terms related to social media platforms

    New Auto-Interp
    Negative Logits
     bandou
    -0.54
     Paese
    -0.53
     Dieu
    -0.53
     Anda
    -0.52
     I
    -0.52
     maro
    -0.52
     Waray
    -0.51
     meni
    -0.50
     Sheeran
    -0.50
     endi
    -0.49
    POSITIVE LOGITS
     youtube
    0.96
     delà
    0.94
     english
    0.89
     instagram
    0.84
     gaussian
    0.83
     wikipedia
    0.83
     facebook
    0.82
     wordpress
    0.81
     japanese
    0.80
     february
    0.79
    Act Density 0.335%

    No Known Activations