INDEX
Explanations
terms related to success and failure, particularly regarding performance metrics
New Auto-Interp
Negative Logits
hoot
-0.22
hire
-0.20
iac
-0.19
hall
-0.18
ei
-0.17
sed
-0.17
erre
-0.16
hp
-0.16
hand
-0.16
hydro
-0.16
POSITIVE LOGITS
TING
0.33
achi
0.32
ting
0.28
ACHI
0.27
ler
0.27
parade
0.24
omi
0.22
lers
0.22
maker
0.21
REC
0.21
Activations Density 0.016%