INDEX
Explanations
proper nouns, especially people.
words related to familial relationships and lineage
People's names
New Auto-Interp
Negative Logits
InjectAttribute
-0.85
AssemblyProduct
-0.83
nakalista
-0.78
للمعارف
-0.77
########.
-0.77
يتيمه
-0.75
RegressionTest
-0.73
ſever
-0.72
myſelf
-0.71
Rohy
-0.71
POSITIVE LOGITS
of
0.55
0.50
üs
0.50
of
0.47
ah
0.47
el
0.47
nas
0.46
sius
0.45
AH
0.44
(
0.44
Activations Density 0.903%