INDEX
    Explanations

    phrases related to reflection and decision-making processes

    New Auto-Interp
    Negative Logits
     whereas
    -0.18
     although
    -0.17
     tuy
    -0.15
     but
    -0.15
    dana
    -0.14
     nor
    -0.14
    vince
    -0.14
    ãģĹãģ¦ãģĬãĤĬ
    -0.14
    itest
    -0.14
    ostel
    -0.14
    POSITIVE LOGITS
     ÙĪØª
    0.20
     à¹ģล
    0.19
    ãģĹãģ¦
    0.19
    ãĤĵãģ§
    0.19
    ï¼ĮæĬĬ
    0.18
    ãģĪãģ¦
    0.18
    ãģĦãģ¦
    0.18
    çĦ¶åIJİ
    0.17
    ãģ£ãģ¦
    0.17
    å¹¶
    0.17
    Act Density 0.413%

    No Known Activations