INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fod
    -0.08
     pans
    -0.07
     kettle
    -0.07
    -0.07
     сл
    -0.07
    εν
    -0.07
     firef
    -0.07
     paj
    -0.07
     tapi
    -0.07
     Edwards
    -0.07
    POSITIVE LOGITS
     That
    0.09
     Hmm
    0.08
    0.08
    _questions
    0.08
    0.07
    Sue
    0.07
     intuition
    0.07
    Aa
    0.07
    Hence
    0.07
    人格
    0.07
    Act Density 0.033%

    No Known Activations