INDEX
Explanations
expressions of uncertainty and conditions related to potential issues
New Auto-Interp
Head Attr Weights
0:0.04
1:0.06
2:0.01
3:0.11
4:0.03
5:0.07
6:0.06
7:0.05
8:0.08
9:0.39
10:0.02
11:0.02
Negative Logits
oké
-2.08
Nap
-1.83
Gutenberg
-1.79
amera
-1.76
raham
-1.72
Bay
-1.70
Nor
-1.70
actionGroup
-1.69
Only
-1.68
Calm
-1.68
POSITIVE LOGITS
unnecessarily
2.47
prematurely
2.24
unexpectedly
2.21
overly
2.20
improperly
2.18
nonetheless
2.14
oris
2.08
impractical
2.06
entimes
2.04
conflicted
2.04
Activations Density 0.235%