INDEX
Explanations
references to activism and social justice issues
New Auto-Interp
Negative Logits
axter
-0.15
klady
-0.15
pÅĻitom
-0.14
lsru
-0.14
izoph
-0.14
navÃŃc
-0.14
iterr
-0.14
.gs
-0.14
pok
-0.13
otti
-0.13
POSITIVE LOGITS
alongside
0.73
junto
0.62
together
0.58
along
0.55
along
0.48
Together
0.45
Together
0.45
zusammen
0.42
cùng
0.42
samen
0.41
Activations Density 0.034%