INDEX
Explanations
collective experiences and shared feelings among groups of people
New Auto-Interp
Negative Logits
Our
-0.24
our
-0.23
наÑĪиÑħ
-0.22
Our
-0.20
our
-0.18
naše
-0.18
nostro
-0.17
æĪij们çļĦ
-0.17
.Our
-0.17
nossa
-0.17
POSITIVE LOGITS
us
1.05
Us
0.74
us
0.66
-us
0.64
Us
0.61
_us
0.57
.us
0.56
(us
0.56
us
0.53
/us
0.50
Activations Density 0.232%