INDEX
    Explanations

    questions beginning with "How."

    New Auto-Interp
    Negative Logits
    most
    -0.17
    ça
    -0.17
     nowhere
    -0.15
    šk
    -0.15
    idis
    -0.14
    ório
    -0.14
     Wel
    -0.14
    owell
    -0.14
    cheon
    -0.14
     it
    -0.14
    POSITIVE LOGITS
    itz
    0.18
    ells
    0.16
    IGHL
    0.15
    dy
    0.15
     does
    0.15
    æĺ¯æĪij
    0.15
     do
    0.14
    ever
    0.14
    ORIZONTAL
    0.14
    ERSHEY
    0.14
    Act Density 0.043%

    No Known Activations