INDEX
Explanations
instances of numerical values or counts
New Auto-Interp
Negative Logits
rep
-0.15
zen
-0.15
Scholar
-0.15
Unsure
-0.14
fool
-0.14
iddy
-0.14
uppe
-0.14
.medium
-0.13
yster
-0.13
Fool
-0.13
POSITIVE LOGITS
iare
0.17
otel
0.16
iez
0.16
_READONLY
0.15
ξι
0.15
ffield
0.15
ãĤ¿ãĥ¼
0.14
addtogroup
0.14
iero
0.14
_EOL
0.14
Activations Density 0.000%