INDEX
Explanations
rebel and rebellion against
New Auto-Interp
Negative Logits
ра
1.22
of
0.98
ي
0.98
ी
0.97
не
0.95
и
0.91
ла
0.91
ला
0.88
p
0.87
و
0.84
POSITIVE LOGITS
rebell
0.92
rebels
0.87
rebel
0.86
Rebellion
0.81
বিদ্রোহ
0.77
revolt
0.76
विद्रोह
0.73
rebellion
0.71
Rebels
0.71
Rebel
0.68
Activations Density 0.007%