INDEX
Negative Logits
terrorism
0.42
carvings
0.42
handheld
0.41
extradition
0.39
颶
0.39
терро
0.39
Terrorism
0.38
रा
0.38
Eliz
0.38
undersea
0.37
POSITIVE LOGITS
系统的
0.43
berarti
0.41
unknown
0.41
ardon
0.41
systèmes
0.41
系統
0.41
system
0.40
systems
0.40
浠
0.39
Means
0.39
Activations Density 0.003%