INDEX
Negative Logits
Cardiff
0.39
schwier
0.39
艰
0.38
constraining
0.37
東西
0.37
रीजन
0.36
Taxonomy
0.36
foreclose
0.36
Hend
0.36
የሚ
0.36
POSITIVE LOGITS
init
0.47
sep
0.46
init
0.45
fit
0.44
arnya
0.43
Init
0.42
heated
0.40
ceased
0.39
Sep
0.38
sep
0.38
Activations Density 0.002%