INDEX
    Explanations

    strong verbs and action-related words in various contexts

    New Auto-Interp
    Negative Logits
    ahn
    -0.17
    atch
    -0.14
    egal
    -0.14
    aran
    -0.14
    emoc
    -0.14
    Lon
    -0.13
     postpone
    -0.13
    urai
    -0.13
    anders
    -0.13
     dagen
    -0.13
    POSITIVE LOGITS
     etc
    0.27
     ÑĤоÑīо
    0.21
    etc
    0.18
    illac
    0.15
    dit
    0.15
    áº
    0.15
    ffects
    0.14
    eson
    0.14
    agnosis
    0.14
    以åıĬ
    0.14
    Act Density 0.193%

    No Known Activations