INDEX
Explanations
references to popular television shows and their seasons
New Auto-Interp
Head Attr Weights
0:0.04
1:0.05
2:0.06
3:0.07
4:0.06
5:0.07
6:0.04
7:0.05
8:0.05
9:0.10
10:0.20
11:0.15
Negative Logits
Watt
-1.79
Tur
-1.58
Wiz
-1.53
Hew
-1.48
Gillespie
-1.44
NVIDIA
-1.41
Kidd
-1.41
Keyboard
-1.40
Blackburn
-1.38
Powered
-1.36
POSITIVE LOGITS
iatrics
1.66
elfare
1.63
girls
1.62
untarily
1.57
覚醒
1.47
paramedics
1.47
sinners
1.46
prostitutes
1.43
Veterinary
1.42
nurses
1.42
Activations Density 0.002%