INDEX
Explanations
references to individuals in narrative contexts
New Auto-Interp
Head Attr Weights
0:0.06
1:0.08
2:0.11
3:0.10
4:0.05
5:0.06
6:0.11
7:0.07
8:0.12
9:0.03
10:0.10
11:0.07
Negative Logits
hindsight
-1.00
Wan
-0.97
actionGroup
-0.94
Thrones
-0.91
Plex
-0.89
��
-0.88
CLASSIFIED
-0.88
Subtle
-0.86
wasteland
-0.86
TTL
-0.85
POSITIVE LOGITS
igl
1.23
said
1.08
aryl
1.02
aughed
0.99
iannopoulos
0.97
neau
0.96
jc
0.96
onga
0.95
Said
0.94
affer
0.94
Activations Density 0.025%