INDEX
Explanations
phrases indicating the construction or development of something from an initial state
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.17
3:0.07
4:0.20
5:0.03
6:0.09
7:0.11
8:0.06
9:0.04
10:0.09
11:0.06
Negative Logits
disapproval
-1.63
SPONSORED
-1.52
Uri
-1.48
Ips
-1.38
Bills
-1.38
dispute
-1.38
Hurricanes
-1.35
Sirius
-1.34
Nanto
-1.34
Cth
-1.32
POSITIVE LOGITS
Reloaded
1.94
ythm
1.83
cheaply
1.80
ntil
1.74
FORE
1.63
matically
1.62
hinges
1.61
customized
1.60
executes
1.57
ADRA
1.57
Activations Density 0.001%