INDEX
Explanations
concepts related to belonging and social connections
New Auto-Interp
Negative Logits
Ãło
-0.16
.named
-0.15
cms
-0.14
.Interop
-0.14
raf
-0.14
ازÙĩ
-0.14
Barrier
-0.14
Bubble
-0.14
-len
-0.14
bubb
-0.13
POSITIVE LOGITS
pert
0.75
Pert
0.65
pert
0.61
Bel
0.57
bel
0.54
Bel
0.50
appart
0.48
BEL
0.47
gehört
0.47
належ
0.46
Activations Density 0.144%