INDEX
Explanations
discussions related to critiques of societal norms and progress narratives
New Auto-Interp
Negative Logits
ulo
-0.15
rama
-0.15
ULO
-0.14
unpredictable
-0.14
qua
-0.14
èĮĤ
-0.14
kle
-0.13
enate
-0.13
iad
-0.13
лива
-0.13
POSITIVE LOGITS
overs
0.23
misses
0.20
shorts
0.18
simplistic
0.18
faulty
0.17
missing
0.17
Simpl
0.16
fall
0.16
Prem
0.15
dated
0.15
Activations Density 0.336%