INDEX
    Explanations

    references to inductions into various halls of fame

    New Auto-Interp
    Negative Logits
    acket
    -0.09
    twig
    -0.08
    urf
    -0.08
    ãĤ¤ãĤ¯
    -0.07
    ÑĪов
    -0.07
    arf
    -0.07
    Unused
    -0.07
    ìĩ
    -0.07
    ÑĢади
    -0.07
    Äĥm
    -0.07
    POSITIVE LOGITS
     into
    0.08
     hall
    0.08
     Hall
    0.08
     halls
    0.07
    hall
    0.07
    608
    0.06
    into
    0.06
    Hall
    0.06
     permanent
    0.06
    405
    0.06
    Act Density 0.005%

    No Known Activations