INDEX
    Explanations

    references to academic journals or articles

    New Auto-Interp
    Negative Logits
    ift
    -0.16
     Mev
    -0.16
    iston
    -0.15
    edback
    -0.15
    icker
    -0.15
     Lowell
    -0.15
    earer
    -0.14
    afen
    -0.14
    bidden
    -0.14
     euch
    -0.14
    POSITIVE LOGITS
    awe
    0.15
     =>$
    0.14
    aldi
    0.14
    lád
    0.14
    ìĽĶ
    0.14
    ellido
    0.14
    rió
    0.14
     productivity
    0.14
    oxid
    0.14
    ctype
    0.13
    Act Density 0.012%

    No Known Activations