INDEX
    Explanations

    instances of words related to taking specific actions or measures

    references to actions or measures taken

    New Auto-Interp
    Negative Logits
    inately
    -0.84
    ILLE
    -0.69
    olls
    -0.69
    orf
    -0.68
    bid
    -0.67
    gdala
    -0.66
    raid
    -0.65
    stown
    -0.65
    ews
    -0.64
    inite
    -0.64
    POSITIVE LOGITS
    iblings
    1.02
     steps
    0.97
    hooting
    0.94
     Steps
    0.87
    hops
    0.86
    hent
    0.81
    steps
    0.78
     forward
    0.78
    isters
    0.76
     toward
    0.76
    Act Density 0.036%

    No Known Activations