INDEX
Explanations
information related to professional accomplishments and affiliations
New Auto-Interp
Negative Logits
abin
-0.19
yt
-0.17
aket
-0.17
gend
-0.16
commons
-0.15
itu
-0.15
aci
-0.15
akan
-0.15
fit
-0.14
sint
-0.14
POSITIVE LOGITS
Society
0.25
Toast
0.24
Toast
0.23
society
0.22
American
0.22
Eta
0.22
Tau
0.22
Lambda
0.21
Phi
0.21
Mens
0.21
Activations Density 0.112%