INDEX
    Explanations

    numerical references and citations in academic texts

    New Auto-Interp
    Negative Logits
    95
    -0.16
    edge
    -0.16
    71
    -0.15
    92
    -0.15
    83
    -0.15
    EDGE
    -0.15
    격
    -0.15
    Edge
    -0.15
    deo
    -0.15
    uga
    -0.15
    POSITIVE LOGITS
    000
    0.42
    001
    0.37
    002
    0.36
    003
    0.36
    004
    0.35
    005
    0.33
    006
    0.29
    007
    0.29
    008
    0.27
    009
    0.23
    Act Density 0.060%

    No Known Activations