INDEX
Explanations
phrases indicating future possibilities or anticipated events
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.13
3:0.05
4:0.13
5:0.02
6:0.03
7:0.37
8:0.02
9:0.03
10:0.06
11:0.08
Negative Logits
oqu
-1.75
mouth
-1.53
apologised
-1.44
guiActive
-1.44
joked
-1.44
boro
-1.42
apologized
-1.41
nicknamed
-1.38
nickname
-1.34
rex
-1.30
POSITIVE LOGITS
viability
1.51
abil
1.49
salvation
1.45
Pascal
1.44
nesota
1.37
probabilities
1.36
sanity
1.31
Sparks
1.30
elusive
1.30
2019
1.30
Activations Density 0.001%