INDEX
Explanations
percentages followed by saying
New Auto-Interp
Negative Logits
\%
0.54
(\<
0.45
outcrops
0.45
varies
0.45
लगभग
0.43
undergoes
0.43
bouts
0.42
உத
0.42
आय
0.41
排序
0.41
POSITIVE LOGITS
saying
0.51
mengatakan
0.50
认为
0.45
утвержда
0.45
insisting
0.45
insisted
0.44
that
0.43
insists
0.42
omo
0.42
said
0.42
Activations Density 0.003%