INDEX
Explanations
phrases and metaphors related to superficial improvements or rebranding
New Auto-Interp
Negative Logits
TypeInfo
-0.17
icare
-0.16
ekk
-0.15
ÑĨез
-0.15
unci
-0.15
uder
-0.15
jour
-0.15
ilar
-0.15
efon
-0.14
ermo
-0.14
POSITIVE LOGITS
fac
0.15
owski
0.15
zi
0.15
СÐŀ
0.15
Kou
0.14
olves
0.14
iani
0.14
osl
0.14
ango
0.14
ÐĵÐŀ
0.14
Activations Density 0.256%