INDEX
    Explanations

    choosing the best option

    New Auto-Interp
    Negative Logits
    Zr
    0.39
     ಅವರ
    0.39
    くい
    0.39
    вань
    0.38
     কুক
    0.38
    预期
    0.37
    rennen
    0.37
    চলে
    0.37
    0.36
    stripos
    0.35
    POSITIVE LOGITS
     Choose
    0.95
    Choose
    0.92
     choose
    0.88
    choose
    0.81
     chooses
    0.80
    選擇
    0.79
     choosing
    0.76
     pilih
    0.74
     Pilih
    0.73
    选择
    0.70
    Act Density 0.007%

    No Known Activations