INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     and
    -2.63
    -1.77
     zout
    -1.70
     lijn
    -1.65
     is
    -1.64
    -1.59
    -1.57
    oneer
    -1.56
    quate
    -1.54
    erea
    -1.52
    POSITIVE LOGITS
    1.96
     людей
    1.76
     pengurus
    1.75
     benötigt
    1.52
     zweite
    1.48
    情况下
    1.47
    1.43
     pensato
    1.41
     друж
    1.41
     dasarnya
    1.40
    Act Density 0.012%

    No Known Activations