INDEX
Explanations
specific references to primary focus areas or key subjects within discussions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.15
3:0.22
4:0.15
5:0.05
6:0.04
7:0.05
8:0.04
9:0.06
10:0.08
11:0.07
Negative Logits
"[
-1.49
)[
-1.40
Interstitial
-1.40
aneously
-1.39
)."
-1.37
angles
-1.36
udden
-1.35
['
-1.35
?」
-1.33
('-1.32
POSITIVE LOGITS
disclaim
1.79
responsibly
1.78
Disclaimer
1.66
PDATED
1.64
THANK
1.63
OIL
1.59
optimism
1.59
cautiously
1.52
caveats
1.48
Hopefully
1.45
Activations Density 0.005%