INDEX
    Explanations

    references to individuals and the concept of deserving punishment

    New Auto-Interp
    Negative Logits
     Fonten
    -0.76
     Arif
    -0.74
     <<<<<<<<<<<<<<
    -0.71
    </caption>
    -0.67
    KommentareTeilen
    -0.66
     חיצוניים
    -0.66
    :✨
    -0.66
     Estelle
    -0.65
     DUR
    -0.64
     >=",
    -0.63
    POSITIVE LOGITS
     cortes
    0.67
    ätä
    0.66
     magát
    0.61
    piele
    0.60
    jstor
    0.60
    yatı
    0.59
     vuotta
    0.58
     viewDidLoad
    0.57
    jét
    0.57
     ruban
    0.56
    Act Density 0.004%

    No Known Activations