INDEX
Explanations
names of specific individuals
proper nouns, particularly names and surnames
New Auto-Interp
Negative Logits
corrective
-0.68
hikers
-0.68
Op
-0.65
ocular
-0.65
ilation
-0.64
appropriately
-0.63
spect
-0.62
UV
-0.62
EMP
-0.62
CL
-0.61
POSITIVE LOGITS
theless
1.24
wald
1.09
die
1.05
erick
1.00
ragon
0.97
fried
0.94
sson
0.93
igham
0.91
fred
0.90
ersen
0.89
Activations Density 0.013%