INDEX
    Explanations

    expressions related to personal opinions and choices

    New Auto-Interp
    Negative Logits
    æ¾
    -0.16
    Äįet
    -0.15
    etwork
    -0.15
    rrha
    -0.15
    anca
    -0.15
    çĶ
    -0.15
    宾
    -0.14
     spis
    -0.14
    Trait
    -0.14
    Brains
    -0.14
    POSITIVE LOGITS
    yp
    0.16
    erer
    0.16
    onn
    0.14
    åħ¥ãĤĮ
    0.14
    äh
    0.14
     rig
    0.14
    hausen
    0.14
    Colon
    0.14
    quina
    0.14
     کاÙĦ
    0.14
    Act Density 0.280%

    No Known Activations