INDEX
Negative Logits
causing
0.85
께
0.75
仍在
0.70
inducing
0.69
ではない
0.69
induced
0.69
нрави
0.68
τζ
0.68
causing
0.68
da
0.67
POSITIVE LOGITS
trip
0.86
ಮಾಡಿದ
0.76
dibuat
0.75
repentance
0.75
programu
0.72
stroke
0.72
configur
0.72
victory
0.72
stok
0.70
existence
0.70
Activations Density 0.031%