INDEX
Explanations
proper nouns such as names of individuals
the name "Webb" and related references in the text
New Auto-Interp
Negative Logits
oras
-0.80
acerb
-0.77
haps
-0.77
ãĥ´
-0.73
Aram
-0.70
CENT
-0.70
onym
-0.69
ãĥ¡
-0.69
stood
-0.69
cess
-0.68
POSITIVE LOGITS
Webb
0.92
atcher
0.86
swick
0.84
enegger
0.83
hedon
0.78
Dixon
0.78
orld
0.76
ithing
0.76
yer
0.75
Weaver
0.73
Activations Density 0.030%