INDEX
    Explanations

    article/possessive adjective

    New Auto-Interp
    Negative Logits
    .fore
    -0.07
    خانه
    -0.07
     SET
    -0.07
     небольш
    -0.07
    ηση
    -0.07
    _ACTIVE
    -0.06
     Navigator
    -0.06
    Vs
    -0.06
     reserve
    -0.06
    ове
    -0.06
    POSITIVE LOGITS
    ():↵↵
    0.07
    0.06
     nerd
    0.06
    plane
    0.06
    0.06
    0.05
    meal
    0.05
     ****
    0.05
    ellation
    0.05
    spawn
    0.05
    Act Density 0.027%

    No Known Activations