INDEX
Explanations
mentions of colors and nationalities
New Auto-Interp
Negative Logits
izarre
-0.76
hran
-0.71
bably
-0.66
taker
-0.66
ETHOD
-0.65
BIL
-0.65
SPONSORED
-0.64
:{-0.63
sonian
-0.63
conclud
-0.63
POSITIVE LOGITS
oxide
1.08
ioxide
1.07
washed
0.76
Grass
0.71
Metallic
0.70
oak
0.70
colored
0.69
stice
0.69
Ń·
0.69
thur
0.69
Activations Density 2.161%