INDEX
Explanations
references to specific movies and their elements
New Auto-Interp
Negative Logits
steen
-0.19
ione
-0.16
662
-0.15
sake
-0.15
bens
-0.15
irie
-0.14
izz
-0.14
deflate
-0.14
URY
-0.14
ALSE
-0.14
POSITIVE LOGITS
Terminator
0.32
TERMIN
0.32
terminator
0.31
Termin
0.30
Schwar
0.27
Ter
0.27
Arnold
0.26
termin
0.24
termin
0.24
terminated
0.23
Activations Density 0.008%