INDEX
Explanations
elements related to strictness or severity
New Auto-Interp
Negative Logits
ksi
-0.18
ariat
-0.17
aney
-0.16
iox
-0.15
меж
-0.15
LOCKS
-0.15
wdx
-0.15
Degrees
-0.15
ocks
-0.14
MESS
-0.14
POSITIVE LOGITS
idity
0.36
ging
0.34
rig
0.34
Rig
0.34
ged
0.30
gers
0.29
rig
0.28
gs
0.25
ueur
0.24
gings
0.24
Activations Density 0.005%