INDEX
Explanations
phrases expressing intentions or actions
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.07
3:0.22
4:0.05
5:0.02
6:0.17
7:0.12
8:0.04
9:0.04
10:0.11
11:0.06
Negative Logits
></
-1.58
eteria
-1.57
['
-1.51
encers
-1.51
meanwhile
-1.48
gangs
-1.47
Corinth
-1.47
adultery
-1.44
rolls
-1.44
tow
-1.41
POSITIVE LOGITS
ruby
1.66
ordial
1.65
Satoshi
1.53
oulos
1.48
sat
1.44
mosqu
1.43
IPM
1.38
opio
1.38
RELE
1.38
insightful
1.36
Activations Density 0.002%