INDEX
Explanations
names or proper nouns
specific proper nouns or unique identifiers
New Auto-Interp
Negative Logits
proble
-0.80
assum
-0.70
contrace
-0.68
اÙĦ
-0.68
treasury
-0.66
conduc
-0.66
intimid
-0.66
princ
-0.66
bible
-0.66
è¦ļéĨĴ
-0.65
POSITIVE LOGITS
bol
0.81
lesh
0.80
anth
0.79
ja
0.78
osa
0.77
nos
0.77
aro
0.76
ella
0.76
vier
0.75
Budd
0.74
Activations Density 0.847%