INDEX
Explanations
named entities or proper nouns
New Auto-Interp
Negative Logits
sse
-0.19
Frank
-0.16
opper
-0.15
.protocol
-0.14
ffa
-0.14
prol
-0.14
Reserved
-0.14
:numel
-0.14
ortho
-0.14
ãĥ¬ãĥ¼
-0.14
POSITIVE LOGITS
porte
0.17
Kenn
0.16
ahir
0.16
undle
0.15
gages
0.15
iye
0.15
lian
0.14
idar
0.14
isbury
0.14
hani
0.14
Activations Density 0.070%