INDEX
Explanations
mentions of different nationalities
New Auto-Interp
Negative Logits
utherford
-0.80
odder
-0.80
anship
-0.80
track
-0.76
eus
-0.75
regor
-0.75
uably
-0.74
ully
-0.73
ording
-0.73
mble
-0.73
POSITIVE LOGITS
nationals
1.11
apolis
1.05
Nadu
1.01
Embassy
0.98
istani
0.97
oslov
0.95
citizen
0.94
embassy
0.92
diplomats
0.89
soil
0.89
Activations Density 0.773%