INDEX
    Explanations

    people and their associated actions

    New Auto-Interp
    Negative Logits
    .”
    0.16
    。”
    0.15
     Contains
    0.15
    ?.
    0.15
     decomposition
    0.15
    .}
    0.14
                
    0.14
     Charging
    0.14
    ."
    0.14
    ().
    0.14
    POSITIVE LOGITS
     have
    0.21
     recognize
    0.18
     spend
    0.17
     perceive
    0.17
     spends
    0.16
    들은
    0.16
    たちは
    0.16
     engage
    0.16
     would
    0.15
    0.15
    Act Density 0.167%

    No Known Activations