INDEX
Explanations
modelmodel/product numbers and abbreviations
New Auto-Interp
Negative Logits
නමුත්
0.30
Lúc
0.29
nerdy
0.28
sarcasm
0.28
່ວນ
0.28
じゃん
0.28
ridicule
0.27
humbled
0.27
trivia
0.27
sexist
0.27
POSITIVE LOGITS
1
0.39
IS
0.39
2
0.38
II
0.37
ALL
0.37
AC
0.37
ID
0.36
IP
0.36
NS
0.36
6
0.36
Activations Density 0.110%