INDEX
Explanations
concepts related to belonging and membership within communities or groups
New Auto-Interp
Negative Logits
raf
-0.17
buz
-0.14
Lifestyle
-0.14
omu
-0.14
arrang
-0.14
ofs
-0.14
riba
-0.13
rico
-0.13
uky
-0.13
Ãło
-0.13
POSITIVE LOGITS
belong
0.66
belongs
0.66
belonged
0.63
belongs
0.58
bel
0.56
Bel
0.55
pert
0.54
належ
0.51
gehört
0.48
å±ŀäºİ
0.48
Activations Density 0.126%