INDEX
Explanations
actions related to collaboration and interaction
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.09
3:0.33
4:0.12
5:0.02
6:0.02
7:0.09
8:0.03
9:0.04
10:0.08
11:0.09
Negative Logits
exempt
-1.53
Origin
-1.45
Insert
-1.42
Internal
-1.37
<-
-1.36
objects
-1.34
Entered
-1.33
uphem
-1.33
Account
-1.33
エル
-1.31
POSITIVE LOGITS
tomorrow
1.99
someday
1.85
morrow
1.79
.","
1.74
️
1.70
izons
1.63
hopefully
1.62
responsibly
1.60
anooga
1.57
goodbye
1.56
Activations Density 0.032%