INDEX
Explanations
family life and connections
New Auto-Interp
Negative Logits
lep
0.39
𝐋
0.38
pruned
0.37
ด
0.36
romagnet
0.36
mis
0.35
площа
0.35
রপ্ত
0.35
骋
0.35
儿子
0.35
POSITIVE LOGITS
members
0.75
👪
0.71
member
0.64
成員
0.61
Family
0.60
Mitglieder
0.57
Members
0.57
成员
0.56
Family
0.56
membros
0.55
Activations Density 0.014%