INDEX
Explanations
themes related to various forms of societal and cultural structures
New Auto-Interp
Negative Logits
Yue
-0.15
’Ñıз
-0.14
osas
-0.14
aras
-0.14
.shiro
-0.14
ãģłãģijãģ§
-0.13
acons
-0.13
stras
-0.13
elan
-0.13
zÄħd
-0.13
POSITIVE LOGITS
-like
0.73
like
0.52
-esque
0.50
-style
0.47
-type
0.41
LIKE
0.36
style
0.35
_like
0.35
èά
0.34
type
0.32
Activations Density 0.499%