INDEX
Explanations
specific entities or numbers
New Auto-Interp
Negative Logits
ฺ
0.46
Kapoor
0.44
Má
0.44
við
0.43
seseorang
0.42
sailboat
0.41
kredit
0.41
chcesz
0.40
ukaz
0.40
Machinist
0.40
POSITIVE LOGITS
ale
0.51
p
0.45
w
0.43
their
0.43
ус
0.42
tained
0.42
efa
0.42
rem
0.41
S
0.41
ée
0.41
Activations Density 0.012%