INDEX
Explanations
evaluate capabilities or test performance
New Auto-Interp
Negative Logits
four
0.59
medical
0.58
nine
0.54
building
0.53
five
0.52
smoothing
0.52
pandemic
0.52
no
0.50
muscle
0.50
housing
0.49
POSITIVE LOGITS
熹
0.51
のために
0.48
Eesti
0.46
Spieler
0.46
娑
0.46
HomePage
0.45
Ελλάδα
0.45
überras
0.45
滃
0.45
*)&
0.44
Activations Density 0.000%