INDEX
    Explanations

    mentions of actions or instructions

    New Auto-Interp
    Negative Logits
    ndra
    -0.85
    ambo
    -0.82
    tions
    -0.80
    -+-+
    -0.77
    ntil
    -0.70
    Ü
    -0.68
    nell
    -0.65
    otten
    -0.64
    aza
    -0.64
    ategories
    -0.64
    POSITIVE LOGITS
     stride
    1.02
     cues
    1.02
     seriously
    1.00
     plunge
    0.97
     reins
    0.96
     cue
    0.95
     liberties
    0.92
     virginity
    0.88
     aback
    0.86
     lightly
    0.84
    Act Density 1.518%

    No Known Activations