INDEX
Explanations
themes related to self-improvement and social validation
New Auto-Interp
Negative Logits
iw
-0.15
alta
-0.14
YLES
-0.14
Kam
-0.14
quest
-0.14
ÙĪØ«
-0.14
Chatt
-0.14
ÑĤон
-0.13
amon
-0.13
çĻº
-0.13
POSITIVE LOGITS
icho
0.17
εξ
0.16
oha
0.15
èĨľ
0.15
erchant
0.14
è¯Ŀ
0.14
agini
0.14
aso
0.14
ernes
0.14
.ObjectModel
0.14
Activations Density 0.305%