INDEX
    Explanations

    terms related to social issues and community influences

    New Auto-Interp
    Negative Logits
    rek
    -0.17
    aron
    -0.17
    edin
    -0.15
    orta
    -0.15
    ÅĻej
    -0.15
    li
    -0.14
    burger
    -0.14
    .nano
    -0.13
    cono
    -0.13
    ël
    -0.13
    POSITIVE LOGITS
     Ù쨥ÙĨ
    0.18
     Ø¥ÙĦا
    0.18
    è°·
    0.17
     certainly
    0.17
    æĭ¬
    0.16
    åIJ¦
    0.16
     still
    0.15
     Still
    0.15
    -valu
    0.15
    iske
    0.15
    Act Density 0.100%

    No Known Activations