INDEX
    Explanations

    references to social interactions and relationships

    New Auto-Interp
    Negative Logits
    ".
    
    -0.85
     iſt
    -0.77
     فريبيس
    -0.76
     myſelf
    -0.76
    ---+
    -0.72
     ſind
    -0.72
     poffible
    -0.71
     himſelf
    -0.69
     $_"
    -0.69
     quæ
    -0.69
    POSITIVE LOGITS
     oh
    1.67
     Oh
    1.64
    Oh
    1.53
    oh
    1.40
     ah
    1.33
    Ah
    1.29
     Ah
    1.28
     Wow
    1.27
     wow
    1.27
     Oops
    1.27
    Act Density 0.474%

    No Known Activations