INDEX
    Explanations

    statements that express strong opinions or judgments

    New Auto-Interp
    Negative Logits
     essentially
    -0.58
     vooz
    -0.54
    Essentially
    -0.54
    <unused20>
    -0.53
    <unused41>
    -0.53
    <unused23>
    -0.52
    <unused43>
    -0.52
    <unused16>
    -0.52
    <pad>
    -0.52
    <unused8>
    -0.52
    POSITIVE LOGITS
     ModelExpression
    0.62
     hipó
    0.42
     swear
    0.35
    AndEndTag
    0.35
    typeorm
    0.35
     kain
    0.34
     espiritual
    0.32
    tiger
    0.32
     worse
    0.31
     cuad
    0.31
    Act Density 0.147%

    No Known Activations