INDEX
Explanations
proper nouns indicating individuals or organizations
New Auto-Interp
Negative Logits
addir
-0.19
Äįet
-0.17
AGR
-0.17
unos
-0.17
abad
-0.17
ovat
-0.16
ODEV
-0.16
adol
-0.16
ikit
-0.16
heap
-0.16
POSITIVE LOGITS
bs
0.33
gs
0.29
ps
0.27
fs
0.26
ng
0.26
hs
0.26
kses
0.25
ff
0.25
ck
0.23
ds
0.23
Activations Density 0.025%