INDEX
    Explanations

    acquiring information/actions

    New Auto-Interp
    Negative Logits
     taking
    -1.20
     Taking
    -1.13
    Taking
    -1.08
     takes
    -0.87
    taking
    -0.82
     take
    -0.80
    er
    -0.76
     tomando
    -0.71
     takers
    -0.68
     tak
    -0.66
    POSITIVE LOGITS
     pleaſure
    0.72
     poffe
    0.68
    TemporalType
    0.66
    ffions
    0.66
     leaſt
    0.65
     ſmall
    0.64
    eace
    0.63
    .*")]
    0.63
    __).
    0.63
     poffible
    0.63
    Act Density 0.111%

    No Known Activations