INDEX
Explanations
common articles and prepositions that indicate locations or relationships
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.04
3:0.04
4:0.15
5:0.02
6:0.14
7:0.35
8:0.03
9:0.03
10:0.06
11:0.06
Negative Logits
wont
-1.42
rats
-1.33
itsch
-1.32
essentials
-1.30
ilitarian
-1.30
lawy
-1.30
avers
-1.27
ér
-1.27
iferation
-1.27
jab
-1.25
POSITIVE LOGITS
fray
2.01
phabet
1.69
orbit
1.65
estamp
1.51
ulia
1.50
captcha
1.50
ranks
1.49
Via
1.47
Wonderland
1.38
Corps
1.38
Activations Density 0.016%