INDEX
    Explanations

    references to pornographic material and related content

    New Auto-Interp
    Negative Logits
    Sklici
    -0.61
    -0.57
     solidar
    -0.55
    Reparto
    -0.54
    Referencie
    -0.52
    Kör
    -0.52
    -0.49
    Zunanje
    -0.49
    Merk
    -0.48
     kapital
    -0.48
    POSITIVE LOGITS
     porn
    1.23
     pornography
    1.11
     Porn
    1.10
    Porn
    0.95
    porn
    0.94
     pamph
    0.71
     unspeak
    0.65
     surpl
    0.63
     subgoals
    0.62
     bourg
    0.62
    Act Density 0.071%

    No Known Activations