INDEX
Explanations
themes of hypocrisy and social critique
New Auto-Interp
Negative Logits
eel
-0.16
ë³µ
-0.15
(æ°´
-0.15
thy
-0.14
onda
-0.14
eln
-0.14
ovy
-0.14
oug
-0.14
heiro
-0.14
หลวà¸ĩ
-0.13
POSITIVE LOGITS
Äijá»ĭnh
0.15
angstrom
0.14
psych
0.14
اسÙĬ
0.14
Spit
0.14
né
0.13
ãģĿãģ®
0.13
ãĥ¼ãĥį
0.13
Bod
0.13
Temper
0.13
Activations Density 0.032%