INDEX
    Explanations

    references to social interactions and experiences

    New Auto-Interp
    Negative Logits
    ſelf
    -0.75
     houſe
    -0.73
     purpoſe
    -0.73
     obſ
    -0.71
     neceſſ
    -0.69
     feroit
    -0.69
     pouvoit
    -0.69
     becauſe
    -0.68
     Eſ
    -0.68
     neceff
    -0.68
    POSITIVE LOGITS
     nab
    0.78
     snag
    0.74
     tuck
    0.73
     sna
    0.73
     chow
    0.71
     popped
    0.71
     indulged
    0.70
     sneak
    0.70
     donned
    0.70
     grab
    0.69
    Act Density 0.422%

    No Known Activations