INDEX
    Explanations

    Islam/religion

    New Auto-Interp
    Negative Logits
     существуют
    -0.08
     anecd
    -0.08
     немец
    -0.08
    .strict
    -0.08
     primer
    -0.07
     invoking
    -0.07
     তা
    -0.07
     primo
    -0.07
     principal
    -0.07
     протест
    -0.07
    POSITIVE LOGITS
     barren
    0.09
     advantageous
    0.08
    ategoria
    0.08
    ategor
    0.08
     flirt
    0.07
    averse
    0.07
    ниц
    0.07
    category
    0.07
     Sharks
    0.07
    0.07
    Act Density 0.016%

    No Known Activations