INDEX
Explanations
references to the city of San Francisco
instances of the phrase "San Francisco."
New Auto-Interp
Negative Logits
inner
-0.66
stock
-0.65
skirts
-0.64
intrins
-0.61
Xbox
-0.60
sk
-0.58
Cyn
-0.58
cull
-0.57
Ark
-0.57
stuffing
-0.57
POSITIVE LOGITS
Francisco
3.41
Diego
2.10
Jose
1.75
José
1.68
Pedro
1.66
Franc
1.64
Antonio
1.57
Fernando
1.52
Julio
1.50
Pablo
1.50
Activations Density 0.020%