INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Shirt
    -0.07
    _offer
    -0.07
     Maar
    -0.06
     Norm
    -0.06
    rud
    -0.06
     Wag
    -0.06
     ministry
    -0.06
    詳細
    -0.06
     rop
    -0.06
     VER
    -0.06
    POSITIVE LOGITS
     projected
    0.07
    failed
    0.07
     capitalist
    0.07
    organization
    0.07
    .Auth
    0.07
    0.06
    _argument
    0.06
     sway
    0.06
    Generation
    0.06
     editing
    0.06
    Act Density 0.007%

    No Known Activations