INDEX
Explanations
describes actions and roles
New Auto-Interp
Negative Logits
ش
0.71
ש
0.69
V
0.68
K
0.61
S
0.60
Z
0.60
م
0.59
Bundes
0.58
N
0.57
ه
0.57
POSITIVE LOGITS
does
0.55
clickView
0.51
уче
0.50
pago
0.49
μα
0.49
psychologist
0.48
respiratory
0.48
Does
0.47
Psychologist
0.46
театр
0.45
Activations Density 0.000%