INDEX
Explanations
names of movies or shows
proper nouns, specifically titles and names related to movies, shows, or notable works
New Auto-Interp
Negative Logits
prompting
-0.69
-0.68
ÄŁ
-0.65
�
-0.61
ende
-0.61
listed
-0.59
confir
-0.59
--------
-0.58
separately
-0.57
Pry
-0.57
POSITIVE LOGITS
")
1.36
").
1.35
"),
1.30
%"
1.22
"]
1.22
",
1.16
";
1.16
"
1.14
"?
1.14
");
1.13
Activations Density 0.237%