INDEX
    Explanations

    verbs related to action and change

    New Auto-Interp
    Negative Logits
    å°Ĩ
    -0.18
     å°Ĩ
    -0.16
    uen
    -0.16
    odel
    -0.15
    387
    -0.15
    å°ĩ
    -0.15
    ire
    -0.15
    antz
    -0.15
     ru
    -0.15
    ãĥ«ãĥĪ
    -0.14
    POSITIVE LOGITS
    á
    0.34
    án
    0.31
    ÃŃa
    0.30
    Ãł
    0.29
    ÃŃan
    0.22
    anno
    0.21
    emos
    0.21
    ás
    0.21
    ait
    0.20
    ia
    0.19
    Act Density 0.007%

    No Known Activations