INDEX
Explanations
expressions of personal identity and self-reflection
New Auto-Interp
Head Attr Weights
0:0.05
1:0.05
2:0.06
3:0.11
4:0.03
5:0.05
6:0.03
7:0.24
8:0.04
9:0.02
10:0.24
11:0.03
Negative Logits
escription
-2.43
ergy
-2.38
rafted
-2.34
withd
-2.29
ociated
-2.27
rompt
-2.27
cellaneous
-2.25
ibus
-2.23
separately
-2.21
explor
-2.19
POSITIVE LOGITS
ALWAYS
3.71
invariably
3.00
always
2.87
always
2.77
forever
2.76
wont
2.62
NEVER
2.52
anytime
2.48
inevitably
2.47
everywhere
2.39
Activations Density 0.017%