INDEX
    Explanations

    mentions of specific names, likely related to a particular person or topic

    proper nouns, particularly names and places

    New Auto-Interp
    Negative Logits
    istically
    -0.92
    istic
    -0.71
    ually
    -0.67
    ities
    -0.65
     Reconstruction
    -0.65
    icals
    -0.64
    ãĥĩ
    -0.63
    senal
    -0.63
     occ
    -0.61
    istical
    -0.60
    POSITIVE LOGITS
    orthy
    1.10
    riter
    1.04
    olf
    0.99
    atcher
    0.95
    inders
    0.95
    ritten
    0.95
    erd
    0.94
    atson
    0.94
    orld
    0.92
    ey
    0.90
    Act Density 0.076%

    No Known Activations