INDEX
Explanations
phrases that indicate the presence or existence of something
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.09
3:0.06
4:0.14
5:0.02
6:0.04
7:0.29
8:0.02
9:0.04
10:0.08
11:0.13
Negative Logits
ilage
-1.49
appre
-1.46
endeav
-1.35
endeavors
-1.35
ende
-1.34
aciously
-1.32
ancial
-1.25
NBC
-1.25
RG
-1.25
��
-1.22
POSITIVE LOGITS
Reviewer
1.45
abo
1.34
oneself
1.32
mite
1.31
Qué
1.30
ouls
1.28
Nanto
1.28
presence
1.21
witnesses
1.18
parentheses
1.17
Activations Density 0.003%