INDEX
    Explanations

    references to dishonor and the consequences of betrayal

    New Auto-Interp
    Negative Logits
     gave
    -0.33
     threw
    -0.31
     drew
    -0.30
     wrote
    -0.29
     grew
    -0.28
     blew
    -0.28
     saw
    -0.28
     took
    -0.28
     broke
    -0.26
    took
    -0.25
    POSITIVE LOGITS
     taken
    0.40
     gone
    0.40
     seen
    0.38
     spoken
    0.37
     gotten
    0.36
     flown
    0.36
     Seen
    0.35
     Taken
    0.35
     eaten
    0.34
    idden
    0.33
    Act Density 0.117%

    No Known Activations