INDEX
Explanations
references to the name "Justin" or "Bieber."
New Auto-Interp
Negative Logits
ktop
-0.76
wark
-0.74
extremes
-0.70
mileage
-0.69
rums
-0.66
exempt
-0.66
confinement
-0.66
ansas
-0.64
womb
-0.63
unda
-0.63
POSITIVE LOGITS
Bieber
1.38
Timber
1.18
Trudeau
1.02
Vernon
0.91
Upton
0.90
Wong
0.85
Hayward
0.84
Bour
0.84
Justin
0.83
onymous
0.82
Activations Density 0.006%