INDEX
Explanations
proper nouns related to people and places
proper nouns and names associated with individuals
New Auto-Interp
Negative Logits
progress
-0.69
ipeg
-0.68
correctness
-0.64
pts
-0.62
strous
-0.60
FIX
-0.59
selection
-0.59
prompt
-0.58
Selection
-0.57
omen
-0.57
POSITIVE LOGITS
Jr
1.34
Sr
1.16
aka
1.06
Jr
0.96
who
0.93
who
0.93
QC
0.90
PhD
0.88
whose
0.88
whose
0.87
Activations Density 0.283%