INDEX
Explanations
the word "solely" indicating exclusivity or limitation
New Auto-Interp
Negative Logits
Nurs
-0.75
ctl
-0.64
LI
-0.64
Mushroom
-0.63
lyn
-0.63
most
-0.62
shift
-0.62
Neighbor
-0.61
Archbishop
-0.61
dl
-0.61
POSITIVE LOGITS
focused
0.94
reliant
0.87
relying
0.86
foc
0.86
responsible
0.85
rely
0.74
catering
0.73
focusing
0.72
relied
0.71
solely
0.69
Activations Density 0.010%