INDEX
Explanations
mentions of the genre of media or literature
New Auto-Interp
Negative Logits
ors
-0.24
loo
-0.17
cie
-0.16
enberg
-0.16
am
-0.16
-moving
-0.14
enie
-0.14
mont
-0.14
burn
-0.14
dụng
-0.14
POSITIVE LOGITS
osate
0.17
ial
0.16
ieten
0.15
ovÄĽ
0.15
/type
0.15
šek
0.15
åĪ¥
0.15
éIJµ
0.14
-toggler
0.14
OfFile
0.14
Activations Density 0.020%