INDEX
Explanations
instances where individuals reflect on their thoughts and feelings
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.04
3:0.06
4:0.08
5:0.02
6:0.04
7:0.48
8:0.03
9:0.03
10:0.07
11:0.06
Negative Logits
mouth
-1.69
ante
-1.65
adra
-1.62
Torrent
-1.46
tumblr
-1.39
ected
-1.36
fed
-1.36
eatured
-1.35
aredevil
-1.35
ioxide
-1.32
POSITIVE LOGITS
aloud
1.85
hypot
1.62
nostalg
1.57
captcha
1.49
possibilities
1.40
nervously
1.39
hov
1.39
visions
1.38
paren
1.37
how
1.36
Activations Density 0.101%