INDEX
Explanations
key terms related to education, societal roles, and community issues
New Auto-Interp
Negative Logits
...↵↵↵↵
-0.18
!↵↵↵↵
-0.17
?↵↵↵↵
-0.17
ãĥ»ãĥ»ãĥ»↵↵
-0.16
�s
-0.15
?↵↵↵↵↵↵
-0.15
:&
-0.15
->___
-0.14
!↵↵↵↵↵↵
-0.14
ðŁĺī↵↵
-0.14
POSITIVE LOGITS
)
0.27
↵
0.24
))
0.21
),
0.18
]
0.18
)
0.18
"
0.18
).
0.17
)↵
0.17
respectively
0.16
Activations Density 0.249%