INDEX
Explanations
political leaders and their statements
New Auto-Interp
Negative Logits
neon
0.74
minimal
0.67
large
0.64
miniature
0.63
average
0.62
female
0.62
huge
0.62
all
0.61
nearby
0.61
an
0.61
POSITIVE LOGITS
сказал
0.79
заявил
0.76
powiedział
0.75
cautioned
0.71
وقال
0.69
ський
0.68
ствовать
0.67
ಹೇಳಿದರು
0.67
считает
0.66
:“
0.66
Activations Density 0.006%