INDEX
    Explanations

    words indicating agreement or affirmation

    New Auto-Interp
    Negative Logits
    -0.81
     (
    -0.64
     "
    -0.58
     “
    -0.57
     -
    -0.54
      
    -0.53
     T
    -0.52
     can
    -0.51
    ↵↵
    -0.50
     set
    -0.49
    POSITIVE LOGITS
     itſelf
    1.13
     שוליים
    1.09
     myſelf
    1.02
    enumi
    0.94
     transfieras
    0.90
    Tembelea
    0.89
    تقاوى
    0.87
    ſelf
    0.86
    BagLayout
    0.85
    parsedMessage
    0.85
    Act Density 0.543%

    No Known Activations