INDEX
Explanations
references to stories and truths, particularly in contrasting contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.16
3:0.07
4:0.22
5:0.03
6:0.04
7:0.16
8:0.04
9:0.03
10:0.09
11:0.07
Negative Logits
inance
-1.36
iggins
-1.32
merga
-1.29
kin
-1.27
annis
-1.25
aird
-1.25
ibu
-1.22
masters
-1.21
inence
-1.19
BILITIES
-1.19
POSITIVE LOGITS
alogy
1.42
unheard
1.35
excerpts
1.34
aloud
1.30
except
1.30
arcs
1.27
anecdotes
1.27
Painter
1.25
scenes
1.25
format
1.24
Activations Density 0.008%