INDEX
    Explanations

    phrases indicating identity, emotional reactions, or situational contexts

    New Auto-Interp
    Negative Logits
    -0.41
    are
    -0.40
     P
    -0.39
    </h2>
    -0.38
     bad
    -0.38
     s
    -0.38
     sur
    -0.38
     du
    -0.38
    -
    -0.37
     Paul
    -0.37
    POSITIVE LOGITS
     propOrder
    1.28
     myſelf
    1.19
     متعلقه
    1.16
     Monfieur
    1.00
    ſelf
    0.99
     Jefus
    0.95
     Efq
    0.94
     itſelf
    0.94
    ſelves
    0.94
     ſche
    0.94
    Act Density 0.082%

    No Known Activations