INDEX
Explanations
hollywood movies and industry figures
New Auto-Interp
Negative Logits
ل
0.94
ac
0.85
as
0.74
ي
0.74
П
0.72
lL
0.71
id
0.70
au
0.70
ol
0.69
atr
0.68
POSITIVE LOGITS
ти
0.84
movies
0.73
uncanny
0.64
,’
0.58
Movies
0.57
↵
0.56
Hollywood
0.55
фильма
0.54
</h2>
0.52
ยก
0.52
Activations Density 0.002%