INDEX
Explanations
phrases indicating possession or association
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.08
3:0.04
4:0.12
5:0.02
6:0.04
7:0.40
8:0.04
9:0.02
10:0.04
11:0.11
Negative Logits
hop
-1.85
bows
-1.80
ø
-1.60
stown
-1.55
Bought
-1.53
hell
-1.50
hest
-1.50
ł
-1.49
ouched
-1.48
arnaev
-1.47
POSITIVE LOGITS
pse
1.95
behavi
1.86
orally
1.75
authoritative
1.74
narr
1.72
AUTH
1.72
覚醒
1.68
unbeliev
1.65
histories
1.63
millenn
1.61
Activations Density 0.000%