INDEX
Explanations
phrases that express uncertainty or speculation about events and outcomes
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.25
4:0.14
5:0.04
6:0.07
7:0.11
8:0.04
9:0.04
10:0.06
11:0.09
Negative Logits
76561
-1.94
Origin
-1.55
untarily
-1.45
src
-1.40
ル
-1.40
uci
-1.37
Quote
-1.32
Neuroscience
-1.31
ドラゴン
-1.30
Celt
-1.30
POSITIVE LOGITS
?'"
1.82
!?
1.64
!?"
1.62
?!"
1.60
?!
1.58
!'"
1.52
.'"
1.49
.<
1.46
.''.
1.46
.</
1.44
Activations Density 0.001%