INDEX
    Explanations

    references to specific individuals or proper nouns

    New Auto-Interp
    Negative Logits
    i
    -0.32
    auf
    -0.29
    iou
    -0.27
    iens
    -0.26
    ÛĮ
    -0.26
    aes
    -0.25
    aed
    -0.24
    aan
    -0.24
    a
    -0.24
    aat
    -0.24
    POSITIVE LOGITS
    dest
    0.24
    venture
    0.23
    vertisement
    0.23
    rian
    0.23
    ia
    0.22
    deo
    0.21
    ler
    0.21
    ewater
    0.20
    imir
    0.20
    ja
    0.20
    Act Density 0.027%

    No Known Activations