INDEX
Explanations
references to ongoing projects and collaborative efforts
New Auto-Interp
Negative Logits
wit
-0.17
ades
-0.15
ss
-0.15
ška
-0.15
apolis
-0.14
askell
-0.14
yen
-0.14
Configurer
-0.14
itus
-0.14
Bake
-0.14
POSITIVE LOGITS
done
0.18
enz
0.17
imony
0.17
ersh
0.16
indsight
0.15
ington
0.15
amental
0.15
aday
0.15
ite
0.14
-done
0.14
Activations Density 0.073%