INDEX
Explanations
phrases indicating actions or events happening in specific contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.11
3:0.05
4:0.14
5:0.03
6:0.12
7:0.28
8:0.03
9:0.03
10:0.06
11:0.06
Negative Logits
soDeliveryDate
-1.59
arte
-1.46
aza
-1.42
catentry
-1.40
ests
-1.40
ounge
-1.37
packages
-1.36
glers
-1.36
umper
-1.35
essee
-1.33
POSITIVE LOGITS
disbelief
1.83
negativity
1.68
incompetence
1.57
insults
1.46
arrogance
1.46
sparks
1.42
disappointment
1.42
irrational
1.42
goof
1.40
halluc
1.39
Activations Density 0.003%