INDEX
    Explanations

    verbs indicating actions towards other people or things

    actions that involve treatment or outcomes impacting individuals or groups

    New Auto-Interp
    Negative Logits
    conn
    -0.76
    eur
    -0.68
    rea
    -0.62
    bomb
    -0.61
     Newman
    -0.61
    bow
    -0.61
    zh
    -0.61
    sw
    -0.61
    leases
    -0.59
    tone
    -0.58
    POSITIVE LOGITS
    ometimes
    1.10
    omething
    1.01
    paces
    0.95
    hift
    0.90
    ynthesis
    0.87
    pace
    0.85
    ilver
    0.84
     Jagu
    0.84
    ettings
    0.78
    heet
    0.77
    Act Density 0.497%

    No Known Activations