INDEX
    Explanations

    references to social media platforms

    New Auto-Interp
    Negative Logits
     Taktlose
    -0.57
     виправивши
    -0.52
     Italijanski
    -0.50
    AddTagHelper
    -0.47
    qtype
    -0.47
     Infórmanos
    -0.46
    dealing
    -0.44
    olerance
    -0.42
     EnglishChoose
    -0.41
    landır
    -0.41
    POSITIVE LOGITS
    Instagram
    1.13
     Instagram
    1.11
     Facebook
    1.09
    Facebook
    1.01
     instagram
    0.95
     FACEBOOK
    0.94
     facebook
    0.93
     INSTAGRAM
    0.92
    instagram
    0.91
     Twitter
    0.87
    Act Density 0.078%

    No Known Activations