INDEX
Explanations
articles and prepositions indicating context
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.10
3:0.07
4:0.20
5:0.05
6:0.06
7:0.18
8:0.05
9:0.05
10:0.08
11:0.07
Negative Logits
transitions
-1.76
interacted
-1.60
humans
-1.57
interacts
-1.55
simulated
-1.52
duction
-1.51
instein
-1.48
processes
-1.48
gov
-1.44
sequence
-1.43
POSITIVE LOGITS
$$$$
1.81
cheers
1.64
ウス
1.60
Mechdragon
1.54
Vanity
1.49
使
1.47
ォ
1.47
HY
1.46
nickname
1.45
Gund
1.43
Activations Density 0.000%