INDEX
Negative Logits
unaltered
0.40
瑚
0.38
indist
0.38
unchanged
0.36
くと
0.35
韻
0.35
рованием
0.34
ਾਰੇ
0.34
人都
0.34
інки
0.34
POSITIVE LOGITS
proactively
0.74
caused
0.73
caused
0.64
efficacement
0.62
efficiently
0.60
путем
0.60
causada
0.59
threats
0.58
migliorare
0.55
intelligently
0.53
Activations Density 0.039%