INDEX
Explanations
words related to legal or political issues
New Auto-Interp
Negative Logits
Niet
-0.72
Seym
-0.72
Moroc
-0.70
Instr
-0.67
Rica
-0.66
Fas
-0.62
ãĥ¼ãĥĨ
-0.62
Berm
-0.61
advoc
-0.61
shenan
-0.58
POSITIVE LOGITS
)))
0.71
);
0.67
};
0.65
][
0.64
));
0.61
"?
0.60
());
0.59
))
0.59
·
0.59
});
0.59
Activations Density 0.278%