INDEX
    Explanations

    phrases indicating potential outcomes or implications

    phrases that indicate meaning and consequences

    New Auto-Interp
    Negative Logits
    ĸļ
    -0.75
     Omn
    -0.67
     Jur
    -0.67
     Wend
    -0.66
     Britann
    -0.64
    raint
    -0.63
     Jal
    -0.61
     Sabb
    -0.61
    Defense
    -0.58
     Herz
    -0.58
    POSITIVE LOGITS
    depending
    0.78
    ESE
    0.72
    gettable
    0.72
     depending
    0.71
    γ
    0.70
    pole
    0.70
     tricky
    0.70
     safely
    0.69
     easily
    0.68
     GOODMAN
    0.68
    Act Density 0.293%

    No Known Activations