INDEX
    Explanations

    the word 'yr' with different activation levels

    New Auto-Interp
    Negative Logits
     Ou
    -0.81
     Fiesta
    -0.72
    leeve
    -0.71
     Ic
    -0.70
     Shack
    -0.68
     Papua
    -0.68
     Bloom
    -0.66
    ļé
    -0.66
     Canal
    -0.65
     Villa
    -0.65
    POSITIVE LOGITS
    rha
    1.27
    interstitial
    0.93
    rr
    0.90
    rh
    0.90
    andom
    0.85
    annis
    0.83
    umph
    0.82
    azines
    0.82
    acial
    0.80
    rocal
    0.79
    Act Density 0.009%

    No Known Activations