INDEX
Explanations
questions related to decision-making, planning, and evaluation of performance in various contexts
New Auto-Interp
Head Attr Weights
0:0.12
1:0.04
2:0.07
3:0.19
4:0.06
5:0.07
6:0.04
7:0.06
8:0.07
9:0.09
10:0.07
11:0.08
Negative Logits
↵
-1.50
ribune
-1.42
doesnt
-1.40
rompt
-1.32
pmwiki
-1.30
File
-1.25
wx
-1.22
):
-1.22
padd
-1.22
laughs
-1.20
POSITIVE LOGITS
)?
1.67
?'"
1.53
'?
1.50
?
1.45
emia
1.39
?'
1.33
?).
1.30
interstitial
1.30
vouchers
1.26
assistants
1.20
Activations Density 0.092%