INDEX
Explanations
film director names
references to film directors
New Auto-Interp
Negative Logits
Peb
-0.79
tics
-0.70
achine
-0.66
Canadians
-0.65
Ukrainians
-0.63
Blueprint
-0.62
Ukraine
-0.61
Kimmel
-0.61
orthy
-0.59
orum
-0.59
POSITIVE LOGITS
igible
1.14
Dir
0.84
der
0.81
ried
0.80
iger
0.78
vana
0.78
idon
0.76
nit
0.75
icular
0.75
dir
0.75
Activations Density 0.010%