INDEX
Explanations
various forms of dialogue or speech in the text
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.12
3:0.13
4:0.04
5:0.02
6:0.17
7:0.08
8:0.04
9:0.07
10:0.13
11:0.12
Negative Logits
Pg
-1.52
̶
-1.42
airports
-1.38
neighboring
-1.36
HB
-1.34
Rapids
-1.31
enc
-1.30
spawns
-1.29
Erie
-1.27
Nort
-1.26
POSITIVE LOGITS
NetMessage
1.77
java
1.59
cffff
1.57
cknow
1.55
truth
1.48
Quote
1.47
sincerity
1.45
Activity
1.44
ldom
1.42
reply
1.41
Activations Density 0.010%