INDEX
Explanations
references to specific locations or entities marked by the letter 'H'
New Auto-Interp
Negative Logits
anken
-0.17
isse
-0.16
chor
-0.15
ost
-0.15
uffman
-0.15
osta
-0.15
063
-0.14
Gard
-0.14
abilit
-0.14
isser
-0.14
POSITIVE LOGITS
erts
0.24
engo
0.24
ert
0.23
umber
0.20
udder
0.18
oun
0.17
agger
0.17
MP
0.16
amp
0.16
udd
0.16
Activations Density 0.012%