INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ідно
    -0.07
    nice
    -0.07
     dictates
    -0.07
    airie
    -0.07
    aptic
    -0.07
    адження
    -0.07
    ponent
    -0.07
    (relative
    -0.06
    \Core
    -0.06
    Across
    -0.06
    POSITIVE LOGITS
    _show
    0.08
    Show
    0.08
     Show
    0.08
    /show
    0.07
     Open
    0.07
     собира
    0.07
    490
    0.07
     setShow
    0.07
     the
    0.07
    Open
    0.07
    Act Density 0.009%

    No Known Activations