INDEX
Negative Logits
serendip
0.41
踌
0.40
Hes
0.39
രണം
0.38
DISCLAIM
0.38
াজী
0.37
romptu
0.37
ilage
0.36
唏
0.36
ahl
0.36
POSITIVE LOGITS
threat
4.03
Threat
3.86
threats
3.84
threat
3.73
Threat
3.67
Threats
3.61
威胁
3.55
threaten
3.53
amea
3.42
threatening
3.31
Activations Density 0.068%