INDEX
    Explanations

    numerical data and figures within the text

    New Auto-Interp
    Negative Logits
    icz
    -0.15
    lyn
    -0.15
    formik
    -0.15
     TEN
    -0.14
    71
    -0.14
    ofire
    -0.14
    ruz
    -0.14
    PullParser
    -0.14
    eg
    -0.14
     Ten
    -0.14
    POSITIVE LOGITS
    980
    0.20
    440
    0.16
    20
    0.16
    960
    0.16
    19
    0.16
    pur
    0.16
    420
    0.16
    Ĥ
    0.16
    460
    0.16
    580
    0.15
    Act Density 0.187%

    No Known Activations