INDEX
    Explanations

    square brackets

    New Auto-Interp
    Negative Logits
     sensible
    -0.06
     мав
    -0.06
     shed
    -0.06
     injured
    -0.06
    ANCH
    -0.06
     могут
    -0.06
     було
    -0.06
    	base
    -0.06
     reference
    -0.06
     frightened
    -0.06
    POSITIVE LOGITS
    *num
    0.07
     kvinnor
    0.07
    0.07
    Qual
    0.06
    ileges
    0.06
    /pdf
    0.06
    0.06
     downhill
    0.06
    alg
    0.06
    ”。↵↵
    0.06
    Act Density 0.049%

    No Known Activations