INDEX
Explanations
references to burgers and similar food items
New Auto-Interp
Negative Logits
Rine
-0.70
Tj
-0.66
—————
-0.64
OAS
-0.64
Kao
-0.62
Amon
-0.61
EPO
-0.61
Bén
-0.59
Whit
-0.59
remporté
-0.58
POSITIVE LOGITS
Burg
1.73
Burg
1.57
Burger
1.55
Burgers
1.54
Burger
1.41
burgers
1.38
Burgess
1.37
burger
1.31
burger
1.30
Burgos
1.27
Activations Density 0.008%