INDEX
Explanations
specific names, entities, or proper nouns related to individuals or organizations
New Auto-Interp
Negative Logits
add
-0.20
ate
-0.18
ee
-0.18
all
-0.18
ase
-0.18
ail
-0.18
ant
-0.17
ass
-0.17
et
-0.17
ado
-0.17
POSITIVE LOGITS
zburg
0.25
sburg
0.24
burg
0.23
lymp
0.22
burgh
0.22
ksen
0.21
recht
0.21
strup
0.21
zheimer
0.21
chwitz
0.21
Activations Density 0.032%