INDEX
Explanations
mentions of television shows and their hosts
New Auto-Interp
Negative Logits
ſind
-0.92
itſelf
-0.84
.",
-0.83
poffible
-0.81
iſt
-0.81
raiſ
-0.81
WebServlet
-0.80
faſt
-0.80
ſelves
-0.79
houſe
-0.78
POSITIVE LOGITS
....
0.81
?
0.81
...
0.79
.....
0.74
I
0.73
…
0.73
0.70
??
0.70
….
0.66
↵
0.64
Activations Density 0.356%