INDEX
Explanations
proper nouns, specifically names like "Lawrence."
the repeated mention of the name "Lawrence" in various contexts
New Auto-Interp
Negative Logits
hedral
-1.00
graded
-0.89
ramid
-0.87
inately
-0.77
scribe
-0.76
minist
-0.75
uttered
-0.73
iliar
-0.73
arding
-0.71
cffff
-0.70
POSITIVE LOGITS
Liver
1.12
Hague
0.84
Berkeley
0.83
Wel
0.82
Kra
0.82
ville
0.81
rence
0.81
burg
0.79
Summers
0.77
Lawrence
0.76
Activations Density 0.041%