INDEX
    Explanations

    quotation marks in prompts

    New Auto-Interp
    Negative Logits
     Stel
    -0.08
     prü
    -0.08
    <|endoftext|>
    -0.08
     현실
    -0.08
     defect
    -0.08
     klaar
    -0.07
     proef
    -0.07
    공지
    -0.07
     Bekijk
    -0.07
     Pedro
    -0.07
    POSITIVE LOGITS
     avoided
    0.10
     phrases
    0.09
    -language
    0.09
    -sing
    0.08
     વગર
    0.08
    Avoid
    0.08
     verzichten
    0.08
     भाषा
    0.08
     evita
    0.08
    avoid
    0.08
    Act Density 0.001%

    No Known Activations