INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .We
    -0.08
     Olson
    -0.07
     expenses
    -0.07
     expense
    -0.07
    -0.07
     Coal
    -0.07
     WK
    -0.06
     Standards
    -0.06
    -0.06
     الص
    -0.06
    POSITIVE LOGITS
     mirror
    0.16
     mirrors
    0.13
     Mir
    0.13
     Mirror
    0.13
    Mir
    0.11
     Miranda
    0.11
     mir
    0.10
    mirror
    0.10
    Mirror
    0.09
     Ariel
    0.08
    Act Density 0.006%

    No Known Activations