INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    wap
    -0.08
     louis
    -0.07
    lash
    -0.07
    /new
    -0.06
    ointments
    -0.06
    /code
    -0.06
    -0.06
    legates
    -0.06
    -is
    -0.06
     Complaint
    -0.06
    POSITIVE LOGITS
    braco
    0.07
    ]-
    0.07
    (true
    0.07
    Invocation
    0.06
     stepped
    0.06
    0.06
     навч
    0.06
     alone
    0.06
    0.06
     terr
    0.06
    Act Density 0.006%

    No Known Activations