INDEX
Explanations
words related to personal narratives or storytelling
discussions about social issues faced by individuals and communities
New Auto-Interp
Negative Logits
§
-0.68
tch
-0.64
Ń·
-0.63
¿½
-0.55
bris
-0.54
whatever
-0.53
lycer
-0.53
Ļ
-0.51
arms
-0.50
İ
-0.50
POSITIVE LOGITS
misconceptions
0.65
myths
0.55
differently
0.54
hypocrisy
0.53
heroism
0.53
topics
0.53
transgender
0.52
pitfalls
0.51
contemporary
0.51
firsthand
0.51
Activations Density 1.309%