INDEX
Explanations
names of people, potentially researchers or authors
occurrences of proper nouns and specific names
New Auto-Interp
Negative Logits
Truman
-0.90
dit
-0.80
Debor
-0.80
tur
-0.77
tor
-0.75
Totem
-0.74
nces
-0.73
TD
-0.72
dt
-0.72
tyr
-0.71
POSITIVE LOGITS
berg
0.94
arn
0.89
Goldberg
0.82
Brooke
0.80
ãĥ¯
0.73
iceberg
0.73
ong
0.71
Live
0.70
burg
0.69
org
0.68
Activations Density 0.359%