INDEX
Explanations
terms related to stability and instability
New Auto-Interp
Negative Logits
yb
-0.16
etre
-0.15
ivity
-0.15
ous
-0.15
pell
-0.15
ETERS
-0.15
Pais
-0.15
owan
-0.14
eren
-0.14
fulness
-0.14
POSITIVE LOGITS
stability
0.20
mate
0.20
coins
0.20
unstable
0.20
mates
0.19
Stability
0.19
ilty
0.18
coin
0.18
instability
0.18
stable
0.18
Activations Density 0.026%