INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ./
    -0.08
     sorrow
    -0.07
     Wilde
    -0.07
     plán
    -0.07
    =form
    -0.07
     intl
    -0.07
    .character
    -0.06
     tutorials
    -0.06
     paragraphs
    -0.06
    .Forms
    -0.06
    POSITIVE LOGITS
    leta
    0.07
     VH
    0.06
     enjo
    0.06
    dream
    0.06
     chatte
    0.06
     OO
    0.06
    feit
    0.06
    0.06
     geil
    0.06
     nær
    0.05
    Act Density 0.025%

    No Known Activations