INDEX
Explanations
control over actions or systems
New Auto-Interp
Negative Logits
يي
0.63
ی
0.56
الك
0.55
افي
0.55
заяви
0.54
)^*
0.54
اين
0.53
детям
0.53
IZING
0.53
اكم
0.53
POSITIVE LOGITS
control
1.16
Control
1.14
Control
1.08
controll
1.02
control
1.00
kontroll
0.96
控制
0.95
controlled
0.92
kontrol
0.92
CONTROL
0.90
Activations Density 0.103%