INDEX
Explanations
references to various school sports teams, specifically focusing on gender and team categories
New Auto-Interp
Negative Logits
raphics
-0.16
oder
-0.16
atable
-0.16
ibur
-0.14
relude
-0.14
iens
-0.14
orges
-0.14
باش
-0.13
ató
-0.13
avs
-0.13
POSITIVE LOGITS
folk
0.20
astr
0.15
Means
0.15
otti
0.15
-only
0.15
fol
0.15
reau
0.15
OwnProperty
0.15
rape
0.14
à¸Ń
0.14
Activations Density 0.020%