INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    -0.20
    orld
    -0.17
    n
    -0.16
     particular
    -0.15
     con
    -0.15
     Bog
    -0.15
    d
    -0.15
    t
    -0.15
     support
    -0.15
    errat
    -0.15
    POSITIVE LOGITS
    noinspection
    0.17
    HELL
    0.15
    .Undef
    0.15
    zeit
    0.15
    yonel
    0.14
    stre
    0.14
    GINE
    0.14
    olin
    0.14
    atre
    0.14
     milano
    0.14
    Act Density 0.098%

    No Known Activations