INDEX
Explanations
phrases that exhibit admiration or positive sentiment towards topics, using a specific grammatical structure involving articles and exclamatory expressions
New Auto-Interp
Negative Logits
atura
-0.17
Shed
-0.15
oku
-0.15
undy
-0.15
Graz
-0.14
pute
-0.14
ista
-0.14
ÑĨÑĥ
-0.14
273
-0.14
ắt
-0.14
POSITIVE LOGITS
difference
0.17
ehr
0.17
pity
0.17
eck
0.16
shame
0.16
contrast
0.16
contrast
0.16
coincidence
0.15
waste
0.15
sis
0.15
Activations Density 0.010%