INDEX
Negative Logits
Can
0.55
Safety
0.54
möglichen
0.54
Impl
0.53
または
0.53
Suggest
0.52
Allows
0.52
Safety
0.51
Couldn
0.51
implies
0.51
POSITIVE LOGITS
soared
0.93
continues
0.93
steadily
0.92
বেড়েছে
0.88
has
0.86
continue
0.84
увеличи
0.84
telah
0.83
meningkat
0.83
đã
0.79
Activations Density 0.004%