INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ipation
    0.41
    0.39
    Nich
    0.38
    Didn
    0.35
     deprive
    0.35
     заня
    0.34
    대한
    0.34
     مادر
    0.34
    0.34
    ريقة
    0.33
    POSITIVE LOGITS
     FF
    0.77
     BB
    0.73
     MM
    0.72
     GG
    0.71
     LL
    0.71
     KK
    0.70
     HH
    0.70
     BBB
    0.67
     VV
    0.67
     JJ
    0.66
    Act Density 0.049%

    No Known Activations