INDEX
Explanations
expressions of frustration or disillusionment with societal expectations and behaviors
New Auto-Interp
Negative Logits
utton
-0.15
ÑĢаÑħов
-0.14
rique
-0.14
ugar
-0.14
Simply
-0.14
arhus
-0.14
jist
-0.14
greater
-0.14
greater
-0.13
ImageContext
-0.13
POSITIVE LOGITS
everybody
0.19
lots
0.16
št
0.15
pars
0.15
nobody
0.14
enormous
0.14
lots
0.14
gigantic
0.14
Actual
0.13
somebody
0.13
Activations Density 0.832%