INDEX
Explanations
words related to divisions or distinctions between different groups or entities
terms related to divisions or gaps between groups or concepts
New Auto-Interp
Negative Logits
oken
-0.80
leeve
-0.76
vez
-0.73
OD
-0.72
elin
-0.70
psc
-0.70
undai
-0.68
agara
-0.67
elta
-0.63
zin
-0.62
POSITIVE LOGITS
naire
0.89
divides
0.80
divide
0.77
between
0.76
pits
0.76
dividing
0.71
separating
0.70
Divide
0.70
Beir
0.69
wid
0.69
Activations Density 0.026%