INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    incinn
    -0.19
    illa
    -0.17
    alsex
    -0.15
    hammad
    -0.15
    leton
    -0.14
    chio
    -0.14
    zim
    -0.14
    WD
    -0.14
    erals
    -0.13
    ître
    -0.13
    POSITIVE LOGITS
    osate
    0.21
    uxtap
    0.17
    ties
    0.16
    ily
    0.15
    ÅĽcie
    0.15
    uria
    0.15
    uars
    0.15
    ordin
    0.14
    theless
    0.14
    де
    0.14
    Act Density 0.306%

    No Known Activations