INDEX
    Explanations

    quotes or direct speech used in context

    phrases that express evaluations or opinions

    New Auto-Interp
    Head Attr Weights
    0:0.12
    1:0.03
    2:0.07
    3:0.13
    4:0.06
    5:0.08
    6:0.05
    7:0.02
    8:0.16
    9:0.14
    10:0.07
    11:0.02
    Negative Logits
    Accessory
    -1.16
    �醒
    -1.16
    jit
    -1.14
    Contact
    -1.13
     MFT
    -1.13
    Downloadha
    -1.07
    iaz
    -1.05
    pora
    -1.04
    etus
    -1.01
    Topics
    -1.01
    POSITIVE LOGITS
    kered
    1.21
    !).
    1.14
     begg
    1.05
    adan
    1.04
    1.03
     gmaxwell
    1.01
     :)
    1.00
     :-)
    0.99
    !.
    0.99
     slee
    0.97
    Act Density 0.066%

    No Known Activations