INDEX
    Explanations

    references to "proof" and related concepts

    New Auto-Interp
    Negative Logits
    fax
    -0.16
    odge
    -0.15
    /Area
    -0.15
    inux
    -0.14
    unch
    -0.14
    夫
    -0.14
    ENCED
    -0.14
    æĬŀ
    -0.14
    ê»ĺ
    -0.14
    uman
    -0.14
    POSITIVE LOGITS
    reading
    0.33
    reader
    0.25
    read
    0.20
    -positive
    0.19
    READING
    0.18
    ed
    0.18
    iness
    0.18
     positive
    0.18
     Positive
    0.18
    enstein
    0.18
    Act Density 0.017%

    No Known Activations