INDEX
Explanations
phrases related to shared experiences and common beliefs within communities
New Auto-Interp
Negative Logits
remainder
-0.20
ضÙĦ
-0.17
ugi
-0.16
QA
-0.15
mons
-0.14
.schedulers
-0.13
chet
-0.13
ůst
-0.13
aten
-0.13
hiro
-0.13
POSITIVE LOGITS
common
0.76
shared
0.66
common
0.61
shared
0.59
Common
0.58
COMMON
0.54
-common
0.53
Shared
0.53
Common
0.52
Shared
0.51
Activations Density 0.184%