INDEX
Explanations
references to a specific person's name
New Auto-Interp
Negative Logits
netflix
-0.80
Dominion
-0.72
lift
-0.71
WARD
-0.69
jet
-0.69
jin
-0.68
hood
-0.67
cloth
-0.64
current
-0.62
Cause
-0.62
POSITIVE LOGITS
ician
1.00
olit
0.88
ano
0.87
ancies
0.83
ary
0.82
inelli
0.81
opol
0.81
icians
0.80
eness
0.76
aly
0.76
Activations Density 0.018%