INDEX
    Explanations

    <|endoftext|>

    New Auto-Interp
    Negative Logits
     Boards
    -0.08
     boards
    -0.08
    bauer
    -0.08
     στον
    -0.08
     Frequ
    -0.08
    (freq
    -0.08
     práct
    -0.08
    äufig
    -0.08
    -0.08
     notable
    -0.08
    POSITIVE LOGITS
     essay
    0.09
     narr
    0.08
    essay
    0.08
    0.07
    0.07
     narration
    0.07
    ろしく
    0.07
     tas
    0.07
     hmm
    0.07
     pár
    0.07
    Act Density 0.044%

    No Known Activations