INDEX
Explanations
words related to cognition or mindful awareness
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.08
3:0.28
4:0.02
5:0.02
6:0.09
7:0.11
8:0.04
9:0.08
10:0.05
11:0.12
Negative Logits
glomer
-1.36
ocre
-1.29
irlf
-1.29
rimp
-1.27
worms
-1.17
waste
-1.17
lineback
-1.17
worms
-1.14
scrimmage
-1.14
deb
-1.14
POSITIVE LOGITS
ances
1.19
Genocide
1.12
ausp
1.11
Objects
1.11
Klu
1.10
UE
1.08
Viper
1.08
Oval
1.05
swear
1.02
Platinum
1.01
Activations Density 0.001%