INDEX
Explanations
white in different languages
New Auto-Interp
Negative Logits
DARK
0.70
Dark
0.68
dark
0.66
Dark
0.64
DARK
0.61
blackberry
0.61
Purple
0.59
darker
0.59
dark
0.57
шокола
0.55
POSITIVE LOGITS
white
2.16
white
2.00
白
1.89
White
1.87
White
1.87
WHITE
1.87
белый
1.82
WHITE
1.80
सफेद
1.77
白色
1.77
Activations Density 0.131%