INDEX
Explanations
expressions of love and community
New Auto-Interp
Negative Logits
umb
-0.17
855
-0.16
.scalablytyped
-0.16
que
-0.15
sus
-0.15
quo
-0.15
ootball
-0.15
pawn
-0.14
ter
-0.14
sse
-0.14
POSITIVE LOGITS
affair
0.27
birds
0.20
joy
0.20
affairs
0.20
able
0.19
Hate
0.19
-kind
0.19
kind
0.19
/lo
0.18
eat
0.17
Activations Density 0.086%