INDEX
Explanations
sections of code or data that are visually structured or commented
New Auto-Interp
Negative Logits
ayo
-0.18
ottie
-0.17
oya
-0.15
oid
-0.14
spa
-0.14
士
-0.14
ssel
-0.14
ucks
-0.14
Spoiler
-0.13
raith
-0.13
POSITIVE LOGITS
ij
0.17
809
0.15
infra
0.14
Pir
0.14
689
0.14
Naz
0.14
Hum
0.14
apist
0.14
Pur
0.14
eneg
0.14
Activations Density 0.020%