INDEX
Explanations
discussions about societal issues, especially those related to different social groups, relationships, and controversial topics
concepts related to societal issues and relationships among different groups
New Auto-Interp
Negative Logits
arij
-0.65
partName
-0.56
Canaver
-0.54
undrum
-0.53
surprisingly
-0.51
interviewed
-0.51
answered
-0.50
SolidGoldMagikarp
-0.50
urai
-0.49
reenshots
-0.49
POSITIVE LOGITS
)).
0.58
".
0.58
%.
0.53
]."
0.50
undet
0.50
]).
0.49
otherwise
0.48
$.
0.47
".[
0.47
.''
0.46
Activations Density 6.799%