INDEX
Explanations
references to locations and specific entities involved in historical contexts
New Auto-Interp
Negative Logits
Parisian
-1.13
CANADIAN
-1.11
Ukrainian
-1.10
Brazilian
-1.10
Norwegian
-1.10
Mexican
-1.09
Canadian
-1.09
Scottish
-1.09
Welsh
-1.06
Berliner
-1.05
POSITIVE LOGITS
India
1.08
Italy
1.06
Germany
0.99
France
0.97
Spain
0.95
Ireland
0.95
Japan
0.94
Nigeria
0.93
Australia
0.93
Canada
0.92
Activations Density 0.755%