INDEX
    Explanations

    names starting with Kir

    New Auto-Interp
    Negative Logits
    -4.13
    -2.73
    !!!!!
    -2.66
     metre
    -2.64
    -2.53
    -2.52
     of
    -2.48
    -2.48
    -2.47
    -2.47
    POSITIVE LOGITS
    1
    3.34
    [
    3.19
    .
    2.95
    ные
    2.81
     `
    2.78
    9
    2.64
    8
    2.59
    3
    2.58
    }
    2.50
    n
    2.47
    Act Density 0.003%

    No Known Activations