INDEX
    Explanations

    names of people, potentially researchers or authors

    occurrences of proper nouns and specific names

    New Auto-Interp
    Negative Logits
     Truman
    -0.90
    dit
    -0.80
     Debor
    -0.80
    tur
    -0.77
    tor
    -0.75
     Totem
    -0.74
    nces
    -0.73
    TD
    -0.72
    dt
    -0.72
     tyr
    -0.71
    POSITIVE LOGITS
    berg
    0.94
    arn
    0.89
     Goldberg
    0.82
     Brooke
    0.80
    ãĥ¯
    0.73
     iceberg
    0.73
    ong
    0.71
     Live
    0.70
    burg
    0.69
    org
    0.68
    Act Density 0.359%

    No Known Activations