INDEX
Negative Logits
।,
0.48
drunkenness
0.42
壊
0.38
аўта
0.38
]^{-0.37
preocupaciones
0.36
]=='
0.36
rumores
0.36
stench
0.35
屢
0.35
POSITIVE LOGITS
interesting
0.67
interested
0.66
interes
0.64
consider
0.59
интересно
0.58
interesa
0.57
интерес
0.57
interessante
0.56
interessant
0.56
interessiert
0.56
Activations Density 0.000%