INDEX
Explanations
b, br, bx followed by specific characters
New Auto-Interp
Negative Logits
ل
0.75
R
0.71
ש
0.68
Y
0.66
The
0.64
л
0.64
H
0.63
G
0.63
P
0.63
ל
0.62
POSITIVE LOGITS
kort
0.56
is
0.54
ция
0.53
ి
0.52
ก
0.51
коли
0.50
బ
0.50
зки
0.50
Organizing
0.50
जिसके
0.49
Activations Density 0.482%