INDEX
Explanations
names of political figures and international affairs
New Auto-Interp
Negative Logits
cryst
-0.66
APD
-0.65
oats
-0.63
TextColor
-0.62
Klux
-0.61
subtract
-0.61
natureconservancy
-0.59
Effective
-0.59
romising
-0.58
-0.58
POSITIVE LOGITS
shi
0.93
icz
0.91
uala
0.84
ny
0.83
wu
0.82
Petersen
0.81
ewski
0.80
ius
0.79
oglu
0.79
quez
0.79
Activations Density 0.122%