INDEX
Explanations
numerical thresholds related to difficulty levels
New Auto-Interp
Negative Logits
ovsky
-0.17
ese
-0.16
Baghd
-0.15
ller
-0.15
styleType
-0.15
ych
-0.15
Ames
-0.15
alse
-0.14
AGON
-0.14
ereotype
-0.14
POSITIVE LOGITS
åĿĽ
0.17
rosse
0.15
obic
0.15
Sherman
0.14
uter
0.14
úi
0.14
aign
0.14
ostel
0.14
HQ
0.13
à¥Ĥद
0.13
Activations Density 0.030%