INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    elyn
    -0.18
    tems
    -0.18
    esc
    -0.17
    kl
    -0.17
    enburg
    -0.16
    gil
    -0.15
    estone
    -0.15
    enz
    -0.15
    rana
    -0.14
    ka
    -0.14
    POSITIVE LOGITS
    abin
    0.25
    ibbean
    0.24
    avan
    0.22
    riages
    0.22
    cter
    0.21
    acter
    0.20
    bohydr
    0.20
    acters
    0.19
    riage
    0.18
    afe
    0.18
    Act Density 0.021%

    No Known Activations