INDEX
    Explanations

    words related to certain specific numerical values or measurements

    New Auto-Interp
    Negative Logits
    atal
    -0.17
    erv
    -0.16
    ep
    -0.15
    ugo
    -0.15
    awn
    -0.15
    aram
    -0.15
    okol
    -0.15
    ivery
    -0.15
    psc
    -0.15
    endoza
    -0.15
    POSITIVE LOGITS
    еждÑĥ
    0.27
    ног
    0.26
    ного
    0.25
    ax
    0.23
    нение
    0.23
    нениÑı
    0.23
    олод
    0.22
    нож
    0.22
    лад
    0.21
    ÑĥзÑĭ
    0.20
    Act Density 0.009%

    No Known Activations