INDEX
Explanations
names of individuals and their relationships or associations with different contexts
New Auto-Interp
Negative Logits
d
-0.49
D
-0.44
د
-0.41
ד
-0.39
দ
-0.37
Д
-0.36
da
-0.36
DA
-0.35
Da
-0.34
ड
-0.34
POSITIVE LOGITS
dan
1.46
don
1.38
ders
1.28
dog
1.26
dr
1.24
din
1.24
dam
1.23
done
1.23
dor
1.22
down
1.18
Activations Density 0.297%