INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     tense
    -0.08
    观点
    -0.08
    aten
    -0.07
    -0.07
     wherein
    -0.07
    rent
    -0.07
    atenate
    -0.07
    -0.07
     satisfactory
    -0.07
     Jos
    -0.07
    POSITIVE LOGITS
     Losing
    0.09
     losing
    0.08
     daddy
    0.08
     Hal
    0.08
     "\\
    0.08
     Messe
    0.07
     Shaun
    0.07
     Streets
    0.07
     lose
    0.07
     Mara
    0.07
    Act Density 0.002%

    No Known Activations