INDEX
    Explanations

    references to numerical values or quantities

    New Auto-Interp
    Negative Logits
     hel
    -0.16
    bor
    -0.15
    ozo
    -0.15
    .Components
    -0.15
     punch
    -0.15
    iju
    -0.14
    bour
    -0.14
    ILA
    -0.14
    важа
    -0.14
    utton
    -0.14
    POSITIVE LOGITS
    eworld
    0.16
    iggins
    0.14
    icer
    0.14
    egl
    0.14
    ucas
    0.14
    erville
    0.14
    Į
    0.13
    idot
    0.13
    erton
    0.13
    елен
    0.13
    Act Density 0.033%

    No Known Activations