INDEX
Explanations
intensity and exaggeration in descriptions
New Auto-Interp
Negative Logits
-0.30
compañías
-0.28
estratégico
-0.26
poussière
-0.26
supérieur
-0.25
Económica
-0.25
officers
-0.24
ằm
-0.24
potensi
-0.24
relógio
-0.23
POSITIVE LOGITS
kasarigan
0.93
AndEndTag
0.85
writeFieldEnd
0.81
Autoritní
0.81
パンチラ
0.79
<unused47>
0.77
<unused74>
0.77
<unused41>
0.77
<unused3>
0.77
<unused28>
0.77
Activations Density 0.031%