INDEX
    Explanations

    say "back or related"

    New Auto-Interp
    Negative Logits
     enne
    -0.09
    .Drawing
    -0.09
     ciert
    -0.08
    lacht
    -0.08
     Translation
    -0.07
     diciembre
    -0.07
     Inputs
    -0.07
     multa
    -0.07
     Stand
    -0.07
     transl
    -0.07
    POSITIVE LOGITS
     symmetrical
    0.14
     through
    0.13
    .inverse
    0.12
     inverse
    0.12
    through
    0.12
    inverse
    0.12
    Through
    0.12
    (back
    0.11
    .back
    0.11
     back
    0.11
    Act Density 0.002%

    No Known Activations