INDEX
Explanations
names of prominent figures and their associated attributes in various contexts
New Auto-Interp
Negative Logits
Ashe
-0.17
ADATA
-0.17
bedo
-0.16
jab
-0.16
rete
-0.15
ATYPE
-0.15
bum
-0.15
antro
-0.15
umar
-0.15
izio
-0.15
POSITIVE LOGITS
(Int
0.21
(IM
0.17
ibly
0.16
(IP
0.16
PB
0.14
CVS
0.14
outr
0.14
(IT
0.14
/I
0.14
Tod
0.14
Activations Density 0.221%