INDEX
    Explanations

    tempting to do something

    New Auto-Interp
    Negative Logits
    та
    0.80
    тат
    0.77
    ۰
    0.77
    tower
    0.76
    ין
    0.75
     engender
    0.75
    ্টর
    0.75
    ِمض
    0.74
    zantine
    0.73
     irão
    0.72
    POSITIVE LOGITS
    0.93
    ला
    0.88
     
    0.83
    deki
    0.82
     sys
    0.82
     sve
    0.80
    ienia
    0.80
     banget
    0.75
     .
    0.73
     làng
    0.73
    Act Density 0.002%

    No Known Activations