INDEX
Explanations
phrases indicating progression or enhancement in context
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.11
3:0.06
4:0.13
5:0.02
6:0.05
7:0.36
8:0.03
9:0.03
10:0.07
11:0.04
Negative Logits
event
-1.45
Feast
-1.42
input
-1.40
Signed
-1.39
archived
-1.38
guiActiveUnfocused
-1.38
rotting
-1.37
da
-1.37
updated
-1.35
efe
-1.35
POSITIVE LOGITS
understanding
1.86
uggest
1.84
redef
1.78
furthe
1.75
estab
1.61
misunderstand
1.59
stereotypes
1.56
VIS
1.55
ixel
1.55
embraces
1.53
Activations Density 0.004%