INDEX
Explanations
instances of the word "directed" indicating film direction
New Auto-Interp
Negative Logits
iliar
-0.17
rious
-0.17
mina
-0.17
mil
-0.15
Ìĥ
-0.15
ascar
-0.15
δεÏĤ
-0.14
lamaz
-0.14
ILLA
-0.14
mili
-0.14
POSITIVE LOGITS
اÛĮÙĩ
0.14
achsen
0.14
alth
0.14
loyalty
0.14
á»ı
0.14
sandwich
0.13
HER
0.13
imento
0.13
Une
0.13
Sandwich
0.13
Activations Density 0.005%