INDEX
Explanations
numeric values followed by currency symbols or measurements
New Auto-Interp
Negative Logits
itſelf
-0.92
ſind
-0.88
"';
-0.86
Efq
-0.83
]),
-0.83
auffi
-0.82
་་
-0.82
iſt
-0.82
Asimismo
-0.81
}';
-0.80
POSITIVE LOGITS
+
0.67
+
0.67
I
0.65
thing
0.65
or
0.64
crappy
0.64
%
0.62
@
0.61
whatever
0.61
&
0.60
Activations Density 0.257%