INDEX
Explanations
phrases identifying different literary forms
New Auto-Interp
Negative Logits
literature
-0.21
Literature
-0.21
.appspot
-0.17
oven
-0.16
æĸĩåѦ
-0.16
elow
-0.15
ader
-0.15
CLS
-0.14
emale
-0.14
ä½ľåĵģ
-0.14
POSITIVE LOGITS
ek
0.28
vill
0.19
ha
0.19
flash
0.18
son
0.18
persona
0.18
hybrids
0.17
tank
0.17
journalism
0.17
found
0.17
Activations Density 0.072%