INDEX
Explanations
attends to the sources of information from the tokens that follow them, suggesting a connection between insights or opinions and what is said afterward
New Auto-Interp
Head Attr Weights
0:0.06
1:0.07
2:0.06
3:0.09
4:0.09
5:0.04
6:0.44
7:0.10
Negative Logits
timus
-0.36
adina
-0.32
还好
-0.31
ADORA
-0.31
—
-0.29
omány
-0.29
venty
-0.28
鍊
-0.28
ktive
-0.28
myname
-0.28
POSITIVE LOGITS
defaultstate
0.44
__':
0.43
RTSC
0.41
FunctionFlags
0.40
ArgumentParser
0.39
IsMutable
0.39
['./
0.39
complexContent
0.38
WriteBarrier
0.38
oprot
0.37
Activations Density 0.076%