INDEX
    Explanations

    references to ceilings and related terminology

    New Auto-Interp
    Negative Logits
    errer
    -0.19
    raž
    -0.16
    emean
    -0.15
    maze
    -0.15
     bare
    -0.14
    ameleon
    -0.14
    heit
    -0.14
    bare
    -0.14
    ette
    -0.14
    ierrez
    -0.14
    POSITIVE LOGITS
    ILING
    0.22
    asar
    0.21
    YLON
    0.21
    iling
    0.19
    asing
    0.19
    idla
    0.18
    idot
    0.17
    APTER
    0.16
    stral
    0.16
    ylon
    0.15
    Act Density 0.016%

    No Known Activations