INDEX
Explanations
expressions of romantic feelings and relationships
New Auto-Interp
Negative Logits
ocab
-0.18
panion
-0.16
criptor
-0.16
gressor
-0.15
abo
-0.14
ostÃŃ
-0.14
obia
-0.14
çĵ
-0.14
atat
-0.14
ربع
-0.14
POSITIVE LOGITS
/lang
0.16
IX
0.16
patent
0.15
ìĿĮìĿĦ
0.14
idol
0.14
dh
0.14
naz
0.14
irable
0.14
warming
0.14
ix
0.14
Activations Density 0.101%