INDEX
Explanations
phrases related to making sense or being meaningful
phrases that express the concept of making a difference or sense
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.22
3:0.08
4:0.04
5:0.05
6:0.01
7:0.06
8:0.20
9:0.06
10:0.11
11:0.06
Negative Logits
anwhile
-1.11
Pigs
-1.03
rought
-1.00
stration
-0.97
planes
-0.97
idth
-0.95
Boo
-0.93
initiated
-0.93
faithfully
-0.92
Badge
-0.91
POSITIVE LOGITS
Wan
1.49
kish
1.33
�
1.21
undo
1.20
sense
1.20
ˈ
1.19
dent
1.11
_-_
1.08
Package
1.07
pmwiki
1.07
Activations Density 0.201%