INDEX
    Explanations

    terms related to selection or choice

    New Auto-Interp
    Negative Logits
    3
    -0.32
    core
    -0.29
     disfruta
    -0.28
     dalamnya
    -0.28
    ,
    -0.27
     pensées
    -0.26
     disfr
    -0.26
    KeepAlive
    -0.25
     rotational
    -0.25
     core
    -0.25
    POSITIVE LOGITS
    選擇
    1.20
    选择
    1.20
     choosing
    1.05
     选择
    1.03
     chọn
    1.03
     choice
    1.02
     선택
    1.01
    choosing
    0.99
     lựa
    0.98
    0.96
    Act Density 0.006%

    No Known Activations