INDEX
    Explanations

    phrases indicating a type of critique or evaluation

    New Auto-Interp
    Negative Logits
    iez
    -0.18
    ral
    -0.17
    isman
    -0.17
    inery
    -0.16
    gis
    -0.16
    /basic
    -0.15
     rather
    -0.15
     basic
    -0.15
    basic
    -0.15
    za
    -0.15
    POSITIVE LOGITS
     necessarily
    0.25
     anymore
    0.23
    ecessarily
    0.17
     nor
    0.17
     Drill
    0.16
     usual
    0.16
     particularly
    0.16
     matter
    0.15
    thing
    0.15
     rocket
    0.15
    Act Density 0.071%

    No Known Activations