INDEX
    Explanations

    phrases indicating strong opinions or evaluations

    statements about problems or conditions that lead to significant consequences

    New Auto-Interp
    Negative Logits
    FN
    -0.65
    known
    -0.61
    SD
    -0.60
    zig
    -0.60
    pring
    -0.59
    than
    -0.59
    odd
    -0.59
    alias
    -0.58
    episode
    -0.58
    wig
    -0.57
    POSITIVE LOGITS
     deserves
    1.45
     attracts
    1.24
     evolves
    1.19
     requires
    1.19
     belongs
    1.18
     seeks
    1.16
     needs
    1.15
     inspires
    1.14
     strives
    1.14
     relies
    1.14
    Act Density 0.183%

    No Known Activations