INDEX
Explanations
proper nouns, especially names and notable individuals
New Auto-Interp
Negative Logits
enthal
-0.17
elib
-0.16
eman
-0.16
èĢħ
-0.15
ing
-0.15
ÑĦеÑĢ
-0.15
emiz
-0.14
οÏĤ
-0.14
ATRIX
-0.14
бÑĥÑĢг
-0.14
POSITIVE LOGITS
riel
0.16
itage
0.16
Cres
0.15
apesh
0.15
eres
0.15
pike
0.14
мин
0.14
ishi
0.14
ufe
0.14
adr
0.14
Activations Density 0.028%