INDEX
Explanations
items described with adjectives
New Auto-Interp
Negative Logits
тре
0.54
milhões
0.52
аны
0.48
ру
0.47
}$
0.47
рен
0.46
лен
0.45
猀
0.45
obligado
0.45
тическая
0.44
POSITIVE LOGITS
four
0.53
eatery
0.49
pointing
0.48
goal
0.47
inthe
0.46
‼
0.45
sp
0.45
ARM
0.45
fight
0.44
Synchron
0.44
Activations Density 0.001%