INDEX
Explanations
phrases related to impossibility or limitations
New Auto-Interp
Negative Logits
ery
-0.79
ãĥī
-0.78
quer
-0.76
rolled
-0.75
roller
-0.73
mon
-0.71
ura
-0.70
ãĥĺ
-0.69
ety
-0.69
late
-0.67
POSITIVE LOGITS
knowing
0.92
risking
0.92
sacrificing
0.88
recourse
0.84
compromising
0.82
encountering
0.79
mentioning
0.76
regard
0.75
seeing
0.70
adequate
0.69
Activations Density 0.040%