INDEX
Negative Logits
件事
0.44
ENCIA
0.44
IMPORTANT
0.41
ização
0.39
попы
0.38
誼
0.38
प्रयास
0.38
نگاه
0.38
Petra
0.38
вопроса
0.38
POSITIVE LOGITS
liar
0.61
hostages
0.60
coward
0.59
victim
0.59
believers
0.59
believer
0.58
vegetarians
0.57
drinker
0.57
millionaire
0.55
stranger
0.55
Activations Density 0.025%