INDEX
    Explanations

    references to the concept of duality, contrast, or variations within a category

    instances of the suffix "ile" in words

    New Auto-Interp
    Negative Logits
    ĸļ
    -0.96
    olver
    -0.85
    iversal
    -0.85
    ĵĺ
    -0.85
    axter
    -0.83
    esson
    -0.82
    icter
    -0.77
    icum
    -0.77
    itivity
    -0.76
    ington
    -0.76
    POSITIVE LOGITS
    mma
    1.05
    tto
    1.05
    zza
    0.81
    tta
    0.77
    tt
    0.77
    ttes
    0.74
    gged
    0.73
    vich
    0.73
    tsky
    0.71
    vel
    0.68
    Act Density 0.021%

    No Known Activations