INDEX
Explanations
proper nouns, particularly names and places
specific names and terms
New Auto-Interp
Negative Logits
dress
-0.63
gong
-0.59
dır
-0.57
dairy
-0.55
Normdatei
-0.54
groups
-0.52
tattoo
-0.51
tó
-0.51
ground
-0.50
trivial
-0.50
POSITIVE LOGITS
ning
0.71
nnnn
0.68
NNNN
0.64
nnn
0.63
n
0.61
ned
0.61
nen
0.60
ization
0.56
na
0.52
ner
0.51
Activations Density 1.706%