INDEX
Explanations
phrases that suggest the act of avoiding or evading something
New Auto-Interp
Negative Logits
eters
-0.15
volta
-0.15
Exiting
-0.15
FieldValue
-0.14
alf
-0.14
?key
-0.14
/Dk
-0.14
vui
-0.14
Levin
-0.13
aybe
-0.13
POSITIVE LOGITS
ance
0.18
lingen
0.16
/mit
0.16
rette
0.15
obus
0.15
Avoid
0.15
Avoid
0.15
saturation
0.14
avoid
0.14
882
0.14
Activations Density 0.013%