INDEX
    Explanations

    references to sources, potentially citations or footnotes

    formatted text or structures within the document

    New Auto-Interp
    Negative Logits
    oor
    -0.80
    ãĤ©
    -0.77
    ãĥ¼ãĥ³
    -0.71
    ient
    -0.71
    æ©
    -0.69
    ãĥ¼ãĥĨãĤ£
    -0.68
    ãĤ¤ãĥĪ
    -0.68
    çīĪ
    -0.67
    ãĤ£
    -0.67
    é¾įå
    -0.67
    POSITIVE LOGITS
    ...]
    1.43
    â̦]
    1.23
    ?]
    0.99
    ][
    0.91
    Pg
    0.90
    .]
    0.88
    ].
    0.87
     ][
    0.86
     ]
    0.86
    !]
    0.85
    Act Density 0.021%

    No Known Activations