INDEX
Negative Logits
Westminster
0.45
ophil
0.40
Nawab
0.39
탕
0.39
wicz
0.38
Verne
0.37
gastronomic
0.36
Nap
0.36
τών
0.36
zeni
0.36
POSITIVE LOGITS
unsafe
0.57
Boeing
0.55
safety
0.55
सेफ्टी
0.54
inspections
0.52
software
0.52
软件
0.51
safety
0.50
deleg
0.50
Deleg
0.49
Activations Density 0.002%