INDEX
Explanations
prepositions used in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.10
3:0.05
4:0.06
5:0.02
6:0.22
7:0.28
8:0.03
9:0.03
10:0.06
11:0.07
Negative Logits
agall
-2.02
humility
-1.71
idity
-1.50
simplicity
-1.50
¯
-1.44
disclaim
-1.43
patience
-1.42
diplomacy
-1.42
bara
-1.41
realism
-1.40
POSITIVE LOGITS
stats
1.48
agues
1.40
Statistical
1.40
ogyn
1.39
REE
1.34
scribed
1.34
athlet
1.31
hack
1.29
existent
1.26
Rak
1.25
Activations Density 0.003%