INDEX
Negative Logits
преступ
0.48
зло
0.45
कर्फ
0.44
लक
0.43
惆
0.43
мер
0.43
Wiltshire
0.43
crime
0.42
নৃত্য
0.41
Detecting
0.41
POSITIVE LOGITS
training
0.43
Saudi
0.42
flush
0.40
Praxis
0.40
Osama
0.39
জিহ
0.38
jihad
0.37
bedded
0.37
marinated
0.37
TRAINING
0.37
Activations Density 0.015%