INDEX
Explanations
phrases related to self-reflection and introspection
New Auto-Interp
Negative Logits
onal
-0.72
heny
-0.72
rought
-0.67
olid
-0.65
Federation
-0.62
emis
-0.61
cru
-0.61
Mub
-0.59
roundup
-0.59
microsoft
-0.58
POSITIVE LOGITS
é¾įåĸļ士
1.05
selves
0.87
ternally
0.77
çīĪ
0.72
imei
0.70
Redd
0.70
ortium
0.69
ä»
0.69
DragonMagazine
0.68
ãģ¯
0.68
Activations Density 0.961%