INDEX
Explanations
phrases that express irony or contradiction
New Auto-Interp
Negative Logits
ographics
-0.16
adolu
-0.15
amas
-0.15
.SC
-0.15
inz
-0.15
Bones
-0.14
yw
-0.14
herits
-0.14
aley
-0.14
åģ
-0.14
POSITIVE LOGITS
665
0.16
عاÙĨ
0.16
ãģ£ãģ±
0.15
PRI
0.15
Mons
0.15
stile
0.15
/pi
0.14
PRI
0.14
shima
0.14
Nun
0.14
Activations Density 0.192%