INDEX
Explanations
references to specific web domains and URLs
New Auto-Interp
Head Attr Weights
0:0.05
1:0.04
2:0.12
3:0.13
4:0.16
5:0.05
6:0.13
7:0.03
8:0.04
9:0.07
10:0.07
11:0.04
Negative Logits
outwe
-1.57
bribes
-1.57
criminals
-1.48
Joker
-1.47
intact
-1.46
lies
-1.45
prostitutes
-1.42
selves
-1.41
bumps
-1.36
SHALL
-1.35
POSITIVE LOGITS
ipedia
2.16
info
2.05
library
2.00
archive
1.95
pedia
1.91
ipl
1.79
useum
1.78
utorial
1.67
web
1.66
Blog
1.66
Activations Density 0.023%