INDEX
Explanations
elements pertaining to boundaries and separation
New Auto-Interp
Negative Logits
Extreme
-0.15
acket
-0.15
xaf
-0.15
akedown
-0.15
ÃŃc
-0.15
loat
-0.14
ondo
-0.14
Sap
-0.14
ITES
-0.14
strom
-0.14
POSITIVE LOGITS
aldi
0.17
oux
0.17
ardy
0.17
ardi
0.16
-wall
0.16
asant
0.15
.spin
0.15
ivant
0.15
walls
0.14
barrier
0.14
Activations Density 0.195%