INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fir
    -0.08
     same
    -0.08
     bothered
    -0.07
    “But
    -0.07
     deine
    -0.07
     Offers
    -0.07
    Module
    -0.07
    `]
    -0.07
    __↵↵
    -0.06
    -0.06
    POSITIVE LOGITS
    Ś
    0.07
     contingent
    0.07
     băng
    0.07
    こんにちは
    0.06
    andex
    0.06
     Winner
    0.06
    0.06
    _LOCAL
    0.06
    Peter
    0.06
    0.06
    Act Density 0.064%

    No Known Activations