INDEX
Explanations
self-preservation, minimizing losses
New Auto-Interp
Negative Logits
یم
0.52
ere
0.47
ie
0.46
mumkin
0.45
Soc
0.44
bea
0.43
possible
0.43
l
0.43
yla
0.42
en
0.42
POSITIVE LOGITS
ресторан
0.61
лизи
0.58
酒店
0.56
蛋白質
0.52
НЕ
0.51
SHOP
0.49
污水
0.49
收费
0.48
𝘢
0.48
restaurante
0.48
Activations Density 0.003%