INDEX
Explanations
phrases related to different groups or communities
references to the concept of community or groups within a specific context
New Auto-Interp
Negative Logits
prol
-0.74
oldemort
-0.71
emb
-0.71
chio
-0.69
di
-0.67
hod
-0.67
ivers
-0.66
oulos
-0.66
iris
-0.65
Clar
-0.65
POSITIVE LOGITS
peers
0.81
whom
0.79
Īè
0.77
them
0.74
those
0.74
warts
0.73
us
0.71
onlook
0.70
among
0.70
wart
0.68
Activations Density 0.029%