INDEX
    Explanations

    references to Russia and its political figures

    New Auto-Interp
    Negative Logits
     ujednoznacz
    -0.55
     NUKAT
    -0.50
     paper
    -0.50
    Dtor
    -0.48
     tips
    -0.47
     beauty
    -0.47
     outdoor
    -0.45
     article
    -0.45
    Jsp
    -0.44
     video
    -0.44
    POSITIVE LOGITS
    Ƚ
    0.41
    PhysRevLett
    0.40
    UniformLocation
    0.38
     hipótesis
    0.37
     nemico
    0.37
     meille
    0.36
    gatron
    0.36
     nikt
    0.36
    ennemi
    0.36
     nemici
    0.36
    Act Density 0.032%

    No Known Activations