INDEX
Explanations
intersections of identities and influences
New Auto-Interp
Negative Logits
discourse
0.69
discursive
0.63
epistemology
0.62
rhetorical
0.61
narrative
0.61
rhetoric
0.60
Discourse
0.59
cognit
0.57
epistem
0.57
rhet
0.56
POSITIVE LOGITS
challenges
0.52
continued
0.50
continue
0.49
adients
0.47
intersect
0.47
challenge
0.44
intersections
0.44
connections
0.44
exciting
0.43
locally
0.43
Activations Density 0.008%