INDEX
    Explanations

    instances of the word "mis" repeated multiple times, likely indicating a focus on detecting words related to mistakes or missteps in the text

    New Auto-Interp
    Negative Logits
    ILA
    -0.67
    INGS
    -0.65
     unto
    -0.64
    eteria
    -0.63
    ¯¯¯¯
    -0.61
     sans
    -0.61
     Destroyer
    -0.60
     Robots
    -0.60
     Ready
    -0.59
    ieri
    -0.59
    POSITIVE LOGITS
    cellaneous
    1.42
    appropri
    1.30
    beh
    1.18
    pelled
    1.10
    behavior
    1.10
    informed
    1.09
    aligned
    1.07
    character
    1.03
    jud
    1.01
    managed
    1.01
    Act Density 0.014%

    No Known Activations