INDEX
Explanations
word beginnings
word beginnings
New Auto-Interp
Negative Logits
ik
0.75
and
0.74
1
0.64
на
0.63
ಾ
0.63
has
0.62
の
0.62
im
0.61
or
0.61
н
0.58
POSITIVE LOGITS
。
0.48
palpable
0.46
perceptible
0.45
۔
0.44
flapping
0.43
0.42
odynam
0.41
berkaitan
0.40
gue
0.38
distracting
0.38
Activations Density 0.304%