INDEX
Negative Logits
interpreting
0.47
تفسیر
0.44
swallowing
0.42
tü
0.41
头的
0.41
开发
0.40
پشتی
0.40
func
0.39
Derivatives
0.39
unlocking
0.39
POSITIVE LOGITS
samples
0.85
sampled
0.83
samples
0.78
randomly
0.77
sampling
0.75
populations
0.73
subpopulations
0.72
sampled
0.72
样本
0.71
random
0.70
Activations Density 0.139%