INDEX
Explanations
specific terms related to controversial or challenging societal issues
New Auto-Interp
Negative Logits
Platform
-0.73
Kinnikuman
-0.66
ulative
-0.65
arial
-0.65
amina
-0.64
Staten
-0.64
Chinatown
-0.64
ortment
-0.64
Hok
-0.63
Amendments
-0.63
POSITIVE LOGITS
ishly
0.76
ensibly
0.75
ecause
0.75
citiz
0.73
<-
0.70
etheless
0.70
\",
0.68
imposed
0.67
ifully
0.67
vation
0.66
Activations Density 13.446%