INDEX
Explanations
references to workshop quality and organization
New Auto-Interp
Negative Logits
etter
-0.16
ofilm
-0.15
asaki
-0.15
@dynamic
-0.15
etten
-0.15
923
-0.14
eren
-0.14
dera
-0.14
practition
-0.14
hots
-0.14
POSITIVE LOGITS
Indian
0.22
Bombay
0.20
BT
0.20
BT
0.19
EE
0.19
Indian
0.18
Mad
0.18
awah
0.18
Pow
0.17
Kan
0.17
Activations Density 0.020%