INDEX
Explanations
numerical values and references to specific data points
New Auto-Interp
Negative Logits
isos
-0.15
ières
-0.14
nants
-0.14
enet
-0.14
enant
-0.14
uracy
-0.14
llib
-0.13
abit
-0.13
etched
-0.13
Christopher
-0.13
POSITIVE LOGITS
Sug
0.18
翼
0.16
opis
0.15
-Col
0.14
roke
0.14
rame
0.14
hta
0.14
gal
0.14
ime
0.14
lum
0.14
Activations Density 0.017%