INDEX
Explanations
concepts and their consequences
New Auto-Interp
Negative Logits
kend
0.51
kable
0.50
dé
0.49
november
0.47
хода
0.47
fim
0.45
сверх
0.45
cet
0.44
ba
0.44
механизм
0.44
POSITIVE LOGITS
ა
0.46
জ
0.43
possibili
0.41
мам
0.41
retaining
0.40
oportunidades
0.40
Minnie
0.40
ες
0.40
مین
0.40
kemungkinan
0.40
Activations Density 0.004%