INDEX
Explanations
references to the color white in various contexts
white followed by nouns
New Auto-Interp
Negative Logits
relationship
-0.60
Journeys
-0.56
ويكيپيديا
-0.53
geschehen
-0.52
ScopeManager
-0.52
Tikang
-0.52
ukunft
-0.52
Rohy
-0.51
capulco
-0.51
decade
-0.51
POSITIVE LOGITS
White
1.25
White
1.23
white
1.19
white
1.17
WHITE
1.11
Putih
1.08
WHITE
1.08
whites
0.90
Whites
0.90
Whites
0.88
Activations Density 0.133%