INDEX
    Explanations

    repetitive or comparative phrases indicating similarity and difference

    New Auto-Interp
    Negative Logits
    seau
    -0.15
    anford
    -0.15
    tet
    -0.15
    yro
    -0.14
    pong
    -0.14
    dependencies
    -0.14
    Ð¡Ð¡Ðł
    -0.14
    анг
    -0.14
    Verifier
    -0.14
    doing
    -0.14
    POSITIVE LOGITS
     justice
    0.35
     Justice
    0.25
     job
    0.25
    justice
    0.24
     things
    0.24
     thing
    0.23
     damage
    0.23
     wrong
    0.23
     jobs
    0.22
    cket
    0.21
    Act Density 0.174%

    No Known Activations