INDEX
Explanations
themes related to social justice and equality
New Auto-Interp
Negative Logits
olsun
-0.15
Ulus
-0.14
/generated
-0.14
ÐŀÐł
-0.14
é¨
-0.14
še
-0.14
caret
-0.14
gne
-0.14
ernet
-0.14
Fried
-0.13
POSITIVE LOGITS
still
0.58
Still
0.54
still
0.52
Still
0.50
STILL
0.47
ä»į
0.42
ainda
0.37
masih
0.36
ancora
0.35
ÙĩÙĨÙĪØ²
0.35
Activations Density 0.216%