INDEX
    Explanations

    pronouns and verbs describing actions or states of being

    New Auto-Interp
    Negative Logits
    umm
    -0.16
     sage
    -0.15
    adla
    -0.14
     ÙģÙĪØ±
    -0.14
    alu
    -0.14
    -demand
    -0.13
    longleftrightarrow
    -0.13
    wy
    -0.13
    ura
    -0.13
     tog
    -0.13
    POSITIVE LOGITS
    esis
    0.16
    aunch
    0.14
    gings
    0.14
    endon
    0.14
    eri
    0.14
    ãĥ¼ãĥģ
    0.14
    ammer
    0.14
     Count
    0.14
    enen
    0.14
    erve
    0.13
    Act Density 0.009%

    No Known Activations