INDEX
    Explanations

    the name "Ko" followed by a number

    New Auto-Interp
    Negative Logits
    senal
    -0.80
     Creed
    -0.77
    glass
    -0.74
    ب
    -0.73
     narrator
    -0.73
     à¨
    -0.73
    Ö¼
    -0.71
    天
    -0.69
    IBLE
    -0.68
    ingham
    -0.66
    POSITIVE LOGITS
    zzi
    1.18
    osta
    1.10
    zy
    0.98
    essler
    0.96
    jo
    0.96
    etter
    0.96
    unin
    0.95
    eln
    0.95
    ppa
    0.95
    pps
    0.94
    Act Density 0.015%

    No Known Activations