INDEX
Negative Logits
Defense
0.40
NIG
0.40
IGENCE
0.39
weisung
0.39
Intelligence
0.38
informé
0.37
خول
0.36
нету
0.36
distributional
0.36
Requirement
0.36
POSITIVE LOGITS
program
0.45
CIA
0.44
rentices
0.43
メン
0.40
პროგრამ
0.40
অ্য
0.39
變
0.39
Program
0.39
plausible
0.39
läk
0.39
Activations Density 0.002%