INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Petro
    -0.07
     Patent
    -0.07
     Peter
    -0.07
     पश
    -0.07
     Clinical
    -0.07
     concent
    -0.06
    Birth
    -0.06
    _extension
    -0.06
     Central
    -0.06
     garg
    -0.06
    POSITIVE LOGITS
     avoid
    0.11
     avoiding
    0.11
     avoided
    0.09
    745
    0.08
     avoids
    0.08
    VO
    0.08
     Avoid
    0.07
    Avoid
    0.07
    VT
    0.07
     Allies
    0.07
    Act Density 0.015%

    No Known Activations