INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    -0.15
    ↵↵
    -0.14
    .↵↵
    -0.13
    .↵
    -0.13
     of
    -0.11
    \n
    -0.11
    -0.11
    -0.10
    。↵
    -0.10
    -0.10
    POSITIVE LOGITS
     Agr
    0.07
    0.07
    孩子们
    0.07
    报记者
    0.07
    0.06
    Brian
    0.06
    .ReadFile
    0.06
    Pwd
    0.06
    akk
    0.06
    大蒜
    0.06
    Act Density 0.264%

    No Known Activations