INDEX
Explanations
references to diversity in various contexts
mentions of diversity or varied groups of people/things
New Auto-Interp
Negative Logits
çͰ
-0.74
rol
-0.74
WARD
-0.73
FORE
-0.73
ENA
-0.71
WAR
-0.68
Ob
-0.66
clicked
-0.65
Removed
-0.63
rollers
-0.63
POSITIVE LOGITS
ively
0.99
ortment
0.92
mble
0.89
iveness
0.89
assemb
0.88
iated
0.86
perspectives
0.86
genders
0.85
avenues
0.84
viewpoints
0.83
Activations Density 0.027%