INDEX
    Explanations

    proper nouns, especially names of individuals

    New Auto-Interp
    Negative Logits
    CLASSIFIED
    -0.73
    ¶ħ
    -0.69
    Ń·
    -0.66
     \'
    -0.64
     Âł Âł
    -0.63
    İĭ
    -0.63
     Wilderness
    -0.63
     ---------
    -0.62
     Disneyland
    -0.62
     ..........
    -0.62
    POSITIVE LOGITS
    mort
    0.86
    sin
    0.84
    inf
    0.79
    make
    0.77
    top
    0.76
    lip
    0.76
    v
    0.76
    win
    0.76
    hex
    0.74
    mor
    0.74
    Act Density 0.276%

    No Known Activations