INDEX
Explanations
descriptions of violent or dramatic actions
New Auto-Interp
Negative Logits
assi
-0.08
escorte
-0.07
isel
-0.06
agnost
-0.06
robat
-0.06
strcasecmp
-0.06
statewide
-0.06
otto
-0.06
Sudoku
-0.06
ÑĨвеÑĤ
-0.06
POSITIVE LOGITS
stunt
0.10
fake
0.10
actor
0.10
actors
0.10
actresses
0.10
fake
0.09
actress
0.09
Actor
0.08
Fake
0.08
Fake
0.08
Activations Density 0.009%