INDEX
Explanations
references to projects and their progress
New Auto-Interp
Negative Logits
darn
-0.20
folks
-0.17
">&#
-0.17
ebi
-0.16
quip
-0.15
folk
-0.15
éĻ
-0.15
folk
-0.15
AIT
-0.15
ÑĤÑĢен
-0.14
POSITIVE LOGITS
fuck
0.25
âĢŀ
0.25
fuck
0.22
fucked
0.21
“
0.20
fucks
0.20
FUCK
0.19
fucking
0.19
kind
0.19
cunt
0.19
Activations Density 0.007%