INDEX
Explanations
elements related to written communication or documentation
New Auto-Interp
Negative Logits
rape
-0.17
odo
-0.16
åĽ´
-0.16
obox
-0.15
ĥ½
-0.14
oda
-0.14
halt
-0.14
gang
-0.14
immer
-0.14
ogue
-0.13
POSITIVE LOGITS
-side
0.26
sides
0.25
side
0.25
flips
0.25
flip
0.23
éĿ¢
0.23
surfaces
0.23
flipped
0.22
Flip
0.22
faces
0.22
Activations Density 0.091%