INDEX
Explanations
references to performances and their critiques in films or plays
actor and character names paired with descriptive roles or character introductions.
New Auto-Interp
Negative Logits
awtextra
-0.44
reconnu
-0.43
udara
-0.41
dzied
-0.41
Daarna
-0.40
besk
-0.40
vanske
-0.40
mußten
-0.40
Thrones
-0.39
régal
-0.39
POSITIVE LOGITS
fictional
0.70
played
0.52
wealthy
0.51
intptr
0.51
titular
0.51
corrupt
0.51
played
0.51
subplot
0.50
fictitious
0.50
neurotic
0.50
Activations Density 0.484%