INDEX
Explanations
concepts related to criticism and social commentary
New Auto-Interp
Negative Logits
yr
-0.17
aurant
-0.16
ted
-0.16
ÏħÏĩ
-0.15
adge
-0.15
ка
-0.14
illery
-0.14
æ§
-0.14
integrity
-0.14
situ
-0.14
POSITIVE LOGITS
heimer
0.17
BOTTOM
0.16
ookies
0.15
oggler
0.15
kker
0.15
ienes
0.15
Descriptions
0.15
uba
0.14
æ¸Ī
0.14
chine
0.14
Activations Density 0.535%