INDEX
Explanations
instances of special characters or symbols
New Auto-Interp
Negative Logits
.
-0.35
S
-0.34
E
-0.33
C
-0.32
,
-0.30
A
-0.28
T
-0.27
P
-0.26
R
-0.24
D
-0.24
POSITIVE LOGITS
IFn
0.19
IIIK
0.17
vyk
0.16
VRTX
0.16
wdx
0.16
styleType
0.15
cctor
0.15
IRQ
0.15
cq
0.14
.xr
0.14
Activations Density 0.005%