INDEX
    Explanations

    expressions of strong personal feelings or preferences

    New Auto-Interp
    Negative Logits
     really
    -0.16
     very
    -0.15
    说
    -0.15
     Say
    -0.15
     quite
    -0.14
     Colleg
    -0.14
    ndx
    -0.14
    aed
    -0.14
    IPS
    -0.14
     saying
    -0.14
    POSITIVE LOGITS
     dig
    0.20
     enjoyed
    0.17
     luck
    0.17
     digging
    0.16
    üz
    0.16
    asel
    0.16
    dig
    0.15
    μι
    0.15
    connect
    0.15
    eshire
    0.15
    Act Density 0.046%

    No Known Activations