INDEX
    Explanations

    references to collective actions or statements involving the word "we."

    Sentences starting with "We"

    New Auto-Interp
    Negative Logits
    -0.51
    ter
    -0.50
     perla
    -0.49
     pomo
    -0.47
     frapp
    -0.45
     valer
    -0.45
     ordering
    -0.44
     objet
    -0.43
    kof
    -0.43
    ly
    -0.43
    POSITIVE LOGITS
     have
    0.94
    '):
    
    0.93
     don
    0.92
     can
    0.91
     would
    0.90
     never
    0.89
     still
    0.87
     didn
    0.86
     had
    0.85
     wouldn
    0.84
    Act Density 0.182%

    No Known Activations