INDEX
Explanations
elements associated with film critiques and narrative elements
New Auto-Interp
Negative Logits
آخر
-0.18
ائز
-0.16
emaakt
-0.16
ilos
-0.16
icher
-0.16
ommen
-0.16
achsen
-0.15
zdrav
-0.15
salopes
-0.15
inois
-0.15
POSITIVE LOGITS
rende
0.28
elige
0.28
ige
0.27
ationale
0.26
uelle
0.25
liche
0.24
rale
0.23
atische
0.23
ende
0.23
iale
0.22
Activations Density 0.047%