INDEX
Explanations
names, brands, and organizations linked to social or political issues
New Auto-Interp
Negative Logits
finger
-0.18
rial
-0.18
sted
-0.17
寿
-0.16
FTP
-0.16
Kob
-0.16
dle
-0.15
.EventQueue
-0.15
cob
-0.15
cale
-0.15
POSITIVE LOGITS
Rock
0.66
rock
0.58
Rock
0.57
ROCK
0.51
rock
0.51
-rock
0.50
rocks
0.40
rocking
0.39
岩
0.38
Rocks
0.37
Activations Density 0.036%