INDEX
Explanations
names followed by words
names of people
New Auto-Interp
Negative Logits
t
0.80
as
0.64
l
0.52
y
0.51
s
0.49
tt
0.49
ta
0.46
ت
0.46
ai
0.46
tint
0.46
POSITIVE LOGITS
يل
0.51
یر
0.43
によると
0.41
ד
0.40
ции
0.40
hướng
0.39
ე
0.39
B
0.39
beş
0.39
şöyle
0.38
Activations Density 13.489%