INDEX
Explanations
phrases indicating negative or cautionary scenarios, often linked to failures or challenges
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.11
3:0.07
4:0.02
5:0.02
6:0.15
7:0.11
8:0.07
9:0.05
10:0.05
11:0.24
Negative Logits
wisely
-1.47
Koran
-1.29
boldly
-1.16
soever
-1.12
Yaz
-1.10
diligently
-1.08
loudly
-1.06
Doodle
-1.06
Pru
-1.06
Brune
-1.05
POSITIVE LOGITS
ensive
1.36
uclear
1.33
rontal
1.33
obyl
1.28
ricular
1.26
dozen
1.23
death
1.23
oval
1.22
termin
1.22
heric
1.22
Activations Density 0.024%