INDEX
Explanations
references to laypeople or non-experts in various contexts
New Auto-Interp
Negative Logits
annies
-0.07
aria
-0.07
TRACE
-0.07
ispecies
-0.06
arium
-0.06
stellung
-0.06
coes
-0.06
idan
-0.06
inia
-0.06
icode
-0.06
POSITIVE LOGITS
person
0.09
persons
0.09
man
0.08
men
0.08
erson
0.08
person
0.08
mans
0.07
lay
0.07
(non
0.07
ngo
0.07
Activations Density 0.002%