INDEX
Explanations
instructional phrases related to programming or application development
New Auto-Interp
Head Attr Weights
0:0.07
1:0.04
2:0.07
3:0.18
4:0.06
5:0.20
6:0.04
7:0.06
8:0.05
9:0.05
10:0.10
11:0.03
Negative Logits
ONSORED
-3.04
workshop
-2.79
geist
-2.44
gallery
-2.38
misogyny
-2.36
instructors
-2.32
"))
-2.31
textbook
-2.29
Episode
-2.28
instructor
-2.28
POSITIVE LOGITS
encrypt
2.86
iping
2.82
transferring
2.76
acterial
2.55
ultan
2.54
onym
2.54
harvesting
2.31
revoke
2.29
wiping
2.29
encrypted
2.23
Activations Density 0.004%