INDEX
Explanations
proper nouns representing individuals
names of notable individuals along with their achievements or roles
New Auto-Interp
Negative Logits
.",
-0.82
!".
-0.75
".[
-0.72
".
-0.63
`.
-0.63
%.
-0.63
'."
-0.61
.""
-0.61
attRot
-0.61
.:
-0.60
POSITIVE LOGITS
*)
0.71
acronym
0.59
pires
0.58
)|
0.55
?)
0.54
Lloyd
0.54
)
0.53
?)
0.52
umbrella
0.51
)]
0.51
Activations Density 1.785%