INDEX
Explanations
words related to physical or metaphorical obstacles
references to difficulties or challenges
New Auto-Interp
Negative Logits
ulet
-0.84
elt
-0.78
ribution
-0.75
erd
-0.72
otide
-0.69
akin
-0.68
akening
-0.68
Ann
-0.67
monds
-0.67
esome
-0.67
POSITIVE LOGITS
obstacle
1.35
obstacles
1.26
hurdles
0.95
imped
0.91
barriers
0.89
pursu
0.84
vanquished
0.82
impede
0.80
hurdle
0.80
undermin
0.80
Activations Density 0.010%