INDEX
    Explanations

    acquiring or obtaining something

    New Auto-Interp
    Negative Logits
    с
    1.62
    1.45
    える
    1.32
    1.27
    س
    1.26
    ب
    1.23
    ین
    1.21
    ра
    1.16
    ia
    1.13
    1
    1.09
    POSITIVE LOGITS
    '
    1.89
    1.61
    on
    1.26
    ri
    1.22
    u
    1.11
    ro
    1.10
    x
    1.08
    )
    1.07
    m
    1.06
    w
    1.06
    Act Density 0.165%

    No Known Activations