INDEX
Explanations
themes related to social justice and equality
New Auto-Interp
Negative Logits
552
-0.14
atcher
-0.14
pedia
-0.14
ziel
-0.12
Instructions
-0.12
äºľ
-0.12
οÏħÏĤ
-0.12
sis
-0.12
pairs
-0.12
ŀæĢ§
-0.12
POSITIVE LOGITS
these
0.76
è¿ĻäºĽ
0.66
these
0.65
These
0.59
These
0.58
THESE
0.57
ÑįÑĤиÑħ
0.45
tÄĽchto
0.43
estos
0.41
bunlar
0.41
Activations Density 0.992%