INDEX
Explanations
phrases that denote the presence of a subject or entity, often at the beginning of sentences
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.06
3:0.21
4:0.07
5:0.05
6:0.17
7:0.03
8:0.06
9:0.09
10:0.09
11:0.05
Negative Logits
quir
-1.34
hotly
-1.25
separately
-1.21
withd
-1.20
latter
-1.17
accordingly
-1.16
plunge
-1.15
stricken
-1.13
snowy
-1.12
subsequent
-1.12
POSITIVE LOGITS
")
2.63
"]
2.55
"),
2.55
"))
2.53
.")
2.34
"?
2.22
"],
2.19
").
2.19
…"
2.18
");
2.15
Activations Density 0.048%