INDEX
    Explanations

    crude humor

    New Auto-Interp
    Negative Logits
    -pills
    -0.07
     захисту
    -0.07
    ^\
    -0.06
     Lup
    -0.06
     elapsedTime
    -0.06
    Decision
    -0.06
     petrol
    -0.06
     Favor
    -0.06
    Card
    -0.06
     adaptation
    -0.06
    POSITIVE LOGITS
    nee
    0.08
     bizarre
    0.07
    Kitchen
    0.07
    roach
    0.06
    0.06
    ing
    0.06
    湿
    0.06
     domaine
    0.06
     obscene
    0.06
     Primitive
    0.06
    Act Density 0.014%

    No Known Activations