INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Heading
    -0.06
     навк
    -0.06
    .ops
    -0.06
     الأك
    -0.06
    centroid
    -0.06
     Strand
    -0.06
     během
    -0.06
     litres
    -0.06
    hydro
    -0.06
     könnte
    -0.06
    POSITIVE LOGITS
     please
    0.12
     Please
    0.10
    Please
    0.09
     PLEASE
    0.08
     CLAIM
    0.08
    aise
    0.08
    please
    0.08
     disappoint
    0.07
    Press
    0.07
    0.07
    Act Density 0.046%

    No Known Activations