INDEX
Explanations
terms related to dramatic content and themes in television and film
New Auto-Interp
Negative Logits
boro
-0.18
istrovstvÃŃ
-0.17
latter
-0.16
è¦
-0.15
ieg
-0.15
ilver
-0.14
tplib
-0.14
oo
-0.14
пÑĤом
-0.14
entially
-0.14
POSITIVE LOGITS
íĭ±
0.23
queen
0.21
atic
0.21
queens
0.20
queen
0.19
llama
0.19
/com
0.18
atur
0.18
-document
0.18
/action
0.17
Activations Density 0.019%