INDEX
Explanations
words and phrases indicating actions or processes related to bypassing or circumventing something
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.21
3:0.04
4:0.28
5:0.04
6:0.02
7:0.03
8:0.11
9:0.08
10:0.05
11:0.02
Negative Logits
ufact
-1.62
isf
-1.44
aughters
-1.39
birds
-1.38
atures
-1.28
cryst
-1.28
akeru
-1.28
cill
-1.25
embr
-1.24
advoc
-1.24
POSITIVE LOGITS
charms
1.41
atche
1.36
charm
1.33
ּ
1.24
Rect
1.21
Bastard
1.19
Thumbnail
1.18
Hole
1.17
mistaken
1.17
Vaughn
1.16
Activations Density 0.002%